Feature engineering for electricity load forecasting#
The purpose of this notebook is to demonstrate how to use skrub and
polars to perform feature engineering for electricity load forecasting.
We will build a set of features (and targets) from different data sources:
Historical weather data for 10 medium to large urban areas in France;
Holidays and standard calendar features for France;
Historical electricity load data for the whole of France.
All these data sources cover a time range from March 23, 2021 to May 31, 2025.
Since our maximum forecasting horizon is 24 hours, we consider that the future weather data is known at a chosen prediction time. Similarly, the holidays and calendar features are known at prediction time for any point in the future.
Therefore, exogenous features derived from the weather and calendar data can be used to engineer “future covariates”. Since the load data is our prediction target, we will can also use it to engineer “past covariates” such as lagged features and rolling aggregations. The future values of the load data (with respect to the prediction time) are used as targets for the forecasting model.
Environment setup#
We need to install some extra dependencies for this notebook if needed (when running jupyterlite). We need the development version of skrub to be able to use the skrub expressions.
%pip install -q https://pypi.anaconda.org/ogrisel/simple/polars/1.24.0/polars-1.24.0-cp39-abi3-emscripten_3_1_58_wasm32.whl
%pip install -q https://pypi.anaconda.org/ogrisel/simple/skrub/0.6.dev0/skrub-0.6.dev0-py3-none-any.whl
%pip install -q altair holidays plotly nbformat
ERROR: polars-1.24.0-cp39-abi3-emscripten_3_1_58_wasm32.whl is not a supported wheel on this platform.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
The following 3 imports are only needed to workaround some limitations when using polars in a pyodide/jupyterlite notebook.
TODO: remove those workarounds once pyodide 0.28 is released with support for the latest polars version.
import datetime
import tzdata # noqa: F401
import pandas as pd
from pyarrow.parquet import read_table
import altair
import numpy as np
import polars as pl
import skrub
from pathlib import Path
import holidays
import warnings
from plotly.io import write_json, read_json # noqa: F401
from tutorial_helpers import (
binned_coverage,
plot_lorenz_curve,
plot_reliability_diagram,
plot_residuals_vs_predicted,
plot_binned_residuals,
plot_horizon_forecast,
collect_cv_predictions,
)
# Ignore warnings from pkg_resources triggered by Python 3.13's multiprocessing.
warnings.filterwarnings("ignore", category=UserWarning, module="pkg_resources")
Calendar and holidays features#
We leverage the holidays package to enrich the time range with some
calendar features such as public holidays in France. We also add some
features that are useful for time series forecasting such as the day of the
week, the day of the year, and the hour of the day.
Note that the holidays package requires us to extract the date for the
French timezone.
Similarly for the calendar features: all the time features are extracted from the time in the French timezone, since it is likely that electricity usage patterns are influenced by inhabitants’ daily routines aligned with the local timezone.
@skrub.deferred
def prepare_french_calendar_data(time):
fr_time = pl.col("time").dt.convert_time_zone("Europe/Paris")
fr_year_min = time.select(fr_time.dt.year().min()).item()
fr_year_max = time.select(fr_time.dt.year().max()).item()
holidays_fr = holidays.country_holidays(
"FR", years=range(fr_year_min, fr_year_max + 1)
)
return time.with_columns(
[
fr_time.dt.hour().alias("cal_hour_of_day"),
fr_time.dt.weekday().alias("cal_day_of_week"),
fr_time.dt.ordinal_day().alias("cal_day_of_year"),
fr_time.dt.year().alias("cal_year"),
fr_time.dt.date().is_in(holidays_fr.keys()).alias("cal_is_holiday"),
],
)
calendar = prepare_french_calendar_data(time)
calendar
Show graph
| time | cal_hour_of_day | cal_day_of_week | cal_day_of_year | cal_year | cal_is_holiday |
|---|---|---|---|---|---|
| 2021-03-23 00:00:00+00:00 | 1 | 2 | 82 | 2021 | False |
| 2021-03-23 01:00:00+00:00 | 2 | 2 | 82 | 2021 | False |
| 2021-03-23 02:00:00+00:00 | 3 | 2 | 82 | 2021 | False |
| 2021-03-23 03:00:00+00:00 | 4 | 2 | 82 | 2021 | False |
| 2021-03-23 04:00:00+00:00 | 5 | 2 | 82 | 2021 | False |
| 2025-05-31 19:00:00+00:00 | 21 | 6 | 151 | 2025 | False |
| 2025-05-31 20:00:00+00:00 | 22 | 6 | 151 | 2025 | False |
| 2025-05-31 21:00:00+00:00 | 23 | 6 | 151 | 2025 | False |
| 2025-05-31 22:00:00+00:00 | 0 | 7 | 152 | 2025 | False |
| 2025-05-31 23:00:00+00:00 | 1 | 7 | 152 | 2025 | False |
time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 36,744 (100.0%)
- Min | Max
- 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00
cal_hour_of_day
Int8- Null values
- 0 (0.0%)
- Unique values
- 24 (< 0.1%)
- Mean ± Std
- 11.5 ± 6.92
- Median ± IQR
- 12.0 ± 11.0
- Min | Max
- 0.00 | 23.0
cal_day_of_week
Int8- Null values
- 0 (0.0%)
- Unique values
- 7 (< 0.1%)
- Mean ± Std
- 4.00 ± 2.00
- Median ± IQR
- 4.00 ± 4.00
- Min | Max
- 1.00 | 7.00
cal_day_of_year
Int16- Null values
- 0 (0.0%)
- Unique values
- 366 (1.0%)
- Mean ± Std
- 180. ± 104.
- Median ± IQR
- 174. ± 177.
- Min | Max
- 1.00 | 366.
cal_year
Int32- Null values
- 0 (0.0%)
- Unique values
- 5 (< 0.1%)
- Mean ± Std
- 2.02e+03 ± 1.26
- Median ± IQR
- 2.02e+03 ± 2.00
- Min | Max
- 2.02e+03 | 2.02e+03
cal_is_holiday
Boolean- Null values
- 0 (0.0%)
- Unique values
- 2 (< 0.1%)
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | time | Datetime | 0 (0.0%) | 36744 (100.0%) | 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00 | |||
| 1 | cal_hour_of_day | Int8 | 0 (0.0%) | 24 (< 0.1%) | 11.5 | 6.92 | 0.00 | 12.0 | 23.0 |
| 2 | cal_day_of_week | Int8 | 0 (0.0%) | 7 (< 0.1%) | 4.00 | 2.00 | 1.00 | 4.00 | 7.00 |
| 3 | cal_day_of_year | Int16 | 0 (0.0%) | 366 (1.0%) | 180. | 104. | 1.00 | 174. | 366. |
| 4 | cal_year | Int32 | 0 (0.0%) | 5 (< 0.1%) | 2.02e+03 | 1.26 | 2.02e+03 | 2.02e+03 | 2.02e+03 |
| 5 | cal_is_holiday | Boolean | 0 (0.0%) | 2 (< 0.1%) |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Electricity load data#
Finally we load the electricity load data. This data will both be used as a target variable but also to craft some lagged and window-aggregated features.
@skrub.deferred
def load_electricity_load_data(time, data_source_folder):
"""Load and aggregate historical load data from the raw CSV files."""
load_data_files = [
data_file
for data_file in sorted(data_source_folder.iterdir())
if data_file.name.startswith("Total Load - Day Ahead")
and data_file.name.endswith(".csv")
]
return time.join(
(
pl.concat(
[
pl.from_pandas(pd.read_csv(data_file, na_values=["N/A", "-"])).drop(
["Day-ahead Total Load Forecast [MW] - BZN|FR"]
)
for data_file in load_data_files
]
).select(
[
pl.col("Time (UTC)")
.str.split(by=" - ")
.list.first()
.str.to_datetime("%d.%m.%Y %H:%M", time_zone="UTC")
.alias("time"),
pl.col("Actual Total Load [MW] - BZN|FR").alias("load_mw"),
]
)
),
on="time",
)
Let’s load the data and check if there are missing values since we will use this data as the target variable for our forecasting model.
electricity_raw = load_electricity_load_data(time, data_source_folder)
electricity_raw.filter(pl.col("load_mw").is_null())
Show graph
| time | load_mw |
|---|---|
| 2021-05-12 08:00:00+00:00 | |
| 2021-05-19 04:00:00+00:00 | |
| 2021-06-03 16:00:00+00:00 | |
| 2021-10-31 00:00:00+00:00 | |
| 2021-10-31 01:00:00+00:00 | |
| 2023-03-26 00:00:00+00:00 | |
| 2023-04-17 12:00:00+00:00 | |
| 2023-04-17 13:00:00+00:00 | |
| 2024-12-31 23:00:00+00:00 | |
| 2025-03-30 02:00:00+00:00 |
time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 36 (100.0%)
- Min | Max
- 2021-05-12T08:00:00+00:00 | 2025-03-30T02:00:00+00:00
load_mw
Float64- Null values
- 36 (100.0%)
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | time | Datetime | 0 (0.0%) | 36 (100.0%) | 2021-05-12T08:00:00+00:00 | 2025-03-30T02:00:00+00:00 | |||
| 1 | load_mw | Float64 | 36 (100.0%) |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
So apparently there a few missing measurements. Let’s use linear interpolation to fill those missing values.
electricity_raw.filter(
(pl.col("time") > pl.datetime(2021, 10, 30, hour=10, time_zone="UTC"))
& (pl.col("time") < pl.datetime(2021, 10, 31, hour=10, time_zone="UTC"))
).skb.eval().plot.line(x="time:T", y="load_mw:Q")
electricity = electricity_raw.with_columns([pl.col("load_mw").interpolate()])
electricity.filter(
(pl.col("time") > pl.datetime(2021, 10, 30, hour=10, time_zone="UTC"))
& (pl.col("time") < pl.datetime(2021, 10, 31, hour=10, time_zone="UTC"))
).skb.eval().plot.line(x="time:T", y="load_mw:Q")
Remark: interpolating missing values in the target column that we will use to train and evaluate our models can bias the learning problem and make our cross-validation metrics misrepresent the performance of the deployed predictive system.
A potentially better approach would be to keep the missing values in the dataset and use a sample_weight mask to keep a contiguous dataset while ignoring the time periods with missing values when training or evaluating the model.
Lagged features#
We can now create some lagged features from the electricity load data.
We will create 3 hourly lagged features, 1 daily lagged feature, and 1 weekly lagged feature. We will also create a rolling median and inter-quartile feature over the last 24 hours and over the last 7 days.
def iqr(col, *, window_size: int):
"""Inter-quartile range (IQR) of a column."""
return col.rolling_quantile(0.75, window_size=window_size) - col.rolling_quantile(
0.25, window_size=window_size
)
electricity_lagged = electricity.with_columns(
[pl.col("load_mw").shift(i).alias(f"load_mw_lag_{i}h") for i in range(1, 4)]
+ [
pl.col("load_mw").shift(24).alias("load_mw_lag_1d"),
pl.col("load_mw").shift(24 * 7).alias("load_mw_lag_1w"),
pl.col("load_mw")
.rolling_median(window_size=24)
.alias("load_mw_rolling_median_24h"),
pl.col("load_mw")
.rolling_median(window_size=24 * 7)
.alias("load_mw_rolling_median_7d"),
iqr(pl.col("load_mw"), window_size=24).alias("load_mw_iqr_24h"),
iqr(pl.col("load_mw"), window_size=24 * 7).alias("load_mw_iqr_7d"),
],
)
electricity_lagged
Show graph
| time | load_mw | load_mw_lag_1h | load_mw_lag_2h | load_mw_lag_3h | load_mw_lag_1d | load_mw_lag_1w | load_mw_rolling_median_24h | load_mw_rolling_median_7d | load_mw_iqr_24h | load_mw_iqr_7d |
|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-23 00:00:00+00:00 | 59823.0 | |||||||||
| 2021-03-23 01:00:00+00:00 | 59369.0 | 59823.0 | ||||||||
| 2021-03-23 02:00:00+00:00 | 57550.0 | 59369.0 | 59823.0 | |||||||
| 2021-03-23 03:00:00+00:00 | 57188.0 | 57550.0 | 59369.0 | 59823.0 | ||||||
| 2021-03-23 04:00:00+00:00 | 60367.0 | 57188.0 | 57550.0 | 59369.0 | ||||||
| 2025-05-31 19:00:00+00:00 | 39069.0 | 39980.0 | 40890.0 | 40175.0 | 41584.0 | 39144.0 | 39356.0 | 40659.0 | 4231.0 | 7238.0 |
| 2025-05-31 20:00:00+00:00 | 40387.0 | 39069.0 | 39980.0 | 40890.0 | 42931.0 | 40286.0 | 39356.0 | 40659.0 | 4159.0 | 7238.0 |
| 2025-05-31 21:00:00+00:00 | 41174.0 | 40387.0 | 39069.0 | 39980.0 | 43812.0 | 41468.0 | 39356.0 | 40659.0 | 4159.0 | 7238.0 |
| 2025-05-31 22:00:00+00:00 | 39664.0 | 41174.0 | 40387.0 | 39069.0 | 41966.0 | 40346.0 | 39356.0 | 40659.0 | 4140.0 | 7238.0 |
| 2025-05-31 23:00:00+00:00 | 36067.0 | 39664.0 | 41174.0 | 40387.0 | 38248.0 | 37076.0 | 39356.0 | 40659.0 | 4823.0 | 7239.0 |
time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 36,744 (100.0%)
- Min | Max
- 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00
load_mw
Float64- Null values
- 0 (0.0%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1h
Float64- Null values
- 1 (< 0.1%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_2h
Float64- Null values
- 2 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_3h
Float64- Null values
- 3 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1d
Float64- Null values
- 24 (< 0.1%)
- Unique values
- 23,342 (63.5%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1w
Float64- Null values
- 168 (0.5%)
- Unique values
- 23,293 (63.4%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.82e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_rolling_median_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 9,644 (26.2%)
- Mean ± Std
- 5.06e+04 ± 9.28e+03
- Median ± IQR
- 4.75e+04 ± 1.29e+04
- Min | Max
- 3.37e+04 | 7.84e+04
load_mw_rolling_median_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 7,138 (19.4%)
- Mean ± Std
- 5.01e+04 ± 8.82e+03
- Median ± IQR
- 4.60e+04 ± 1.35e+04
- Min | Max
- 3.85e+04 | 7.39e+04
load_mw_iqr_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 5,922 (16.1%)
- Mean ± Std
- 6.52e+03 ± 1.56e+03
- Median ± IQR
- 6.43e+03 ± 2.05e+03
- Min | Max
- 2.32e+03 | 1.60e+04
load_mw_iqr_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 5,327 (14.5%)
- Mean ± Std
- 8.30e+03 ± 1.41e+03
- Median ± IQR
- 8.27e+03 ± 1.63e+03
- Min | Max
- 5.04e+03 | 1.86e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | time | Datetime | 0 (0.0%) | 36744 (100.0%) | 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00 | |||
| 1 | load_mw | Float64 | 0 (0.0%) | 23353 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 2 | load_mw_lag_1h | Float64 | 1 (< 0.1%) | 23353 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 3 | load_mw_lag_2h | Float64 | 2 (< 0.1%) | 23352 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 4 | load_mw_lag_3h | Float64 | 3 (< 0.1%) | 23352 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 5 | load_mw_lag_1d | Float64 | 24 (< 0.1%) | 23342 (63.5%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 6 | load_mw_lag_1w | Float64 | 168 (0.5%) | 23293 (63.4%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.82e+04 | 8.66e+04 |
| 7 | load_mw_rolling_median_24h | Float64 | 23 (< 0.1%) | 9644 (26.2%) | 5.06e+04 | 9.28e+03 | 3.37e+04 | 4.75e+04 | 7.84e+04 |
| 8 | load_mw_rolling_median_7d | Float64 | 167 (0.5%) | 7138 (19.4%) | 5.01e+04 | 8.82e+03 | 3.85e+04 | 4.60e+04 | 7.39e+04 |
| 9 | load_mw_iqr_24h | Float64 | 23 (< 0.1%) | 5922 (16.1%) | 6.52e+03 | 1.56e+03 | 2.32e+03 | 6.43e+03 | 1.60e+04 |
| 10 | load_mw_iqr_7d | Float64 | 167 (0.5%) | 5327 (14.5%) | 8.30e+03 | 1.41e+03 | 5.04e+03 | 8.27e+03 | 1.86e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
altair.Chart(electricity_lagged.tail(100).skb.preview()).transform_fold(
[
"load_mw",
"load_mw_lag_1h",
"load_mw_lag_2h",
"load_mw_lag_3h",
"load_mw_lag_1d",
"load_mw_lag_1w",
"load_mw_rolling_median_24h",
"load_mw_rolling_median_7d",
"load_mw_rolling_iqr_24h",
"load_mw_rolling_iqr_7d",
],
as_=["key", "load_mw"],
).mark_line(tooltip=True).encode(x="time:T", y="load_mw:Q", color="key:N").interactive()
Important remark about lagged features engineering and system lag#
When working with historical data, we often have access to all the past measurements in the dataset. However, when we want to use the lagged features in a forecasting model, we need to be careful about the length of the system lag: the time between a timestamped measurement is made in the real world and the time the record is made available to the downstream application (in our case, a deployed predictive pipeline).
System lag is rarely explicitly represented in the data sources even if such delay can be as large as several hours or even days and can sometimes be irregular. For instance, if there is a human intervention in the data recording process, holidays and weekends can punctually add significant delay.
If the system lag is larger than the maximum feature engineering lag, the resulting features be filled with missing values once deployed. More importantly, if the system lag is not handled explicitly, those resulting missing values will only be present in the features computed for the deployed system but not present in the features computed to train and backtest the system before deployment.
This structural discrepancy can severely degrade the performance of the deployed model compared to the performance estimated from backtesting on the historical data.
We will set this problem aside for now but discuss it again in a later section of this tutorial.
Investigating outliers in the lagged features#
Let’s use the skrub.TableReport tool to look at the plots of the marginal
distribution of the lagged features.
from skrub import TableReport
TableReport(electricity_lagged.skb.eval())
Processing column 1 / 11
Processing column 2 / 11
Processing column 3 / 11
Processing column 4 / 11
Processing column 5 / 11
Processing column 6 / 11
Processing column 7 / 11
Processing column 8 / 11
Processing column 9 / 11
Processing column 10 / 11
Processing column 11 / 11
| time | load_mw | load_mw_lag_1h | load_mw_lag_2h | load_mw_lag_3h | load_mw_lag_1d | load_mw_lag_1w | load_mw_rolling_median_24h | load_mw_rolling_median_7d | load_mw_iqr_24h | load_mw_iqr_7d |
|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-23 00:00:00+00:00 | 59823.0 | |||||||||
| 2021-03-23 01:00:00+00:00 | 59369.0 | 59823.0 | ||||||||
| 2021-03-23 02:00:00+00:00 | 57550.0 | 59369.0 | 59823.0 | |||||||
| 2021-03-23 03:00:00+00:00 | 57188.0 | 57550.0 | 59369.0 | 59823.0 | ||||||
| 2021-03-23 04:00:00+00:00 | 60367.0 | 57188.0 | 57550.0 | 59369.0 | ||||||
| 2025-05-31 19:00:00+00:00 | 39069.0 | 39980.0 | 40890.0 | 40175.0 | 41584.0 | 39144.0 | 39356.0 | 40659.0 | 4231.0 | 7238.0 |
| 2025-05-31 20:00:00+00:00 | 40387.0 | 39069.0 | 39980.0 | 40890.0 | 42931.0 | 40286.0 | 39356.0 | 40659.0 | 4159.0 | 7238.0 |
| 2025-05-31 21:00:00+00:00 | 41174.0 | 40387.0 | 39069.0 | 39980.0 | 43812.0 | 41468.0 | 39356.0 | 40659.0 | 4159.0 | 7238.0 |
| 2025-05-31 22:00:00+00:00 | 39664.0 | 41174.0 | 40387.0 | 39069.0 | 41966.0 | 40346.0 | 39356.0 | 40659.0 | 4140.0 | 7238.0 |
| 2025-05-31 23:00:00+00:00 | 36067.0 | 39664.0 | 41174.0 | 40387.0 | 38248.0 | 37076.0 | 39356.0 | 40659.0 | 4823.0 | 7239.0 |
time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 36,744 (100.0%)
- Min | Max
- 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00
load_mw
Float64- Null values
- 0 (0.0%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1h
Float64- Null values
- 1 (< 0.1%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_2h
Float64- Null values
- 2 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_3h
Float64- Null values
- 3 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1d
Float64- Null values
- 24 (< 0.1%)
- Unique values
- 23,342 (63.5%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1w
Float64- Null values
- 168 (0.5%)
- Unique values
- 23,293 (63.4%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.82e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_rolling_median_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 9,644 (26.2%)
- Mean ± Std
- 5.06e+04 ± 9.28e+03
- Median ± IQR
- 4.75e+04 ± 1.29e+04
- Min | Max
- 3.37e+04 | 7.84e+04
load_mw_rolling_median_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 7,138 (19.4%)
- Mean ± Std
- 5.01e+04 ± 8.82e+03
- Median ± IQR
- 4.60e+04 ± 1.35e+04
- Min | Max
- 3.85e+04 | 7.39e+04
load_mw_iqr_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 5,922 (16.1%)
- Mean ± Std
- 6.52e+03 ± 1.56e+03
- Median ± IQR
- 6.43e+03 ± 2.05e+03
- Min | Max
- 2.32e+03 | 1.60e+04
load_mw_iqr_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 5,327 (14.5%)
- Mean ± Std
- 8.30e+03 ± 1.41e+03
- Median ± IQR
- 8.27e+03 ± 1.63e+03
- Min | Max
- 5.04e+03 | 1.86e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | time | Datetime | 0 (0.0%) | 36744 (100.0%) | 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00 | |||
| 1 | load_mw | Float64 | 0 (0.0%) | 23353 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 2 | load_mw_lag_1h | Float64 | 1 (< 0.1%) | 23353 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 3 | load_mw_lag_2h | Float64 | 2 (< 0.1%) | 23352 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 4 | load_mw_lag_3h | Float64 | 3 (< 0.1%) | 23352 (63.6%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 5 | load_mw_lag_1d | Float64 | 24 (< 0.1%) | 23342 (63.5%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.81e+04 | 8.66e+04 |
| 6 | load_mw_lag_1w | Float64 | 168 (0.5%) | 23293 (63.4%) | 4.99e+04 | 1.05e+04 | 2.87e+04 | 4.82e+04 | 8.66e+04 |
| 7 | load_mw_rolling_median_24h | Float64 | 23 (< 0.1%) | 9644 (26.2%) | 5.06e+04 | 9.28e+03 | 3.37e+04 | 4.75e+04 | 7.84e+04 |
| 8 | load_mw_rolling_median_7d | Float64 | 167 (0.5%) | 7138 (19.4%) | 5.01e+04 | 8.82e+03 | 3.85e+04 | 4.60e+04 | 7.39e+04 |
| 9 | load_mw_iqr_24h | Float64 | 23 (< 0.1%) | 5922 (16.1%) | 6.52e+03 | 1.56e+03 | 2.32e+03 | 6.43e+03 | 1.60e+04 |
| 10 | load_mw_iqr_7d | Float64 | 167 (0.5%) | 5327 (14.5%) | 8.30e+03 | 1.41e+03 | 5.04e+03 | 8.27e+03 | 1.86e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 36,744 (100.0%)
- Min | Max
- 2021-03-23T00:00:00+00:00 | 2025-05-31T23:00:00+00:00
load_mw
Float64- Null values
- 0 (0.0%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1h
Float64- Null values
- 1 (< 0.1%)
- Unique values
- 23,353 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_2h
Float64- Null values
- 2 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_3h
Float64- Null values
- 3 (< 0.1%)
- Unique values
- 23,352 (63.6%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1d
Float64- Null values
- 24 (< 0.1%)
- Unique values
- 23,342 (63.5%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.81e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_lag_1w
Float64- Null values
- 168 (0.5%)
- Unique values
- 23,293 (63.4%)
- Mean ± Std
- 4.99e+04 ± 1.05e+04
- Median ± IQR
- 4.82e+04 ± 1.41e+04
- Min | Max
- 2.87e+04 | 8.66e+04
load_mw_rolling_median_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 9,644 (26.2%)
- Mean ± Std
- 5.06e+04 ± 9.28e+03
- Median ± IQR
- 4.75e+04 ± 1.29e+04
- Min | Max
- 3.37e+04 | 7.84e+04
load_mw_rolling_median_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 7,138 (19.4%)
- Mean ± Std
- 5.01e+04 ± 8.82e+03
- Median ± IQR
- 4.60e+04 ± 1.35e+04
- Min | Max
- 3.85e+04 | 7.39e+04
load_mw_iqr_24h
Float64- Null values
- 23 (< 0.1%)
- Unique values
- 5,922 (16.1%)
- Mean ± Std
- 6.52e+03 ± 1.56e+03
- Median ± IQR
- 6.43e+03 ± 2.05e+03
- Min | Max
- 2.32e+03 | 1.60e+04
load_mw_iqr_7d
Float64- Null values
- 167 (0.5%)
- Unique values
- 5,327 (14.5%)
- Mean ± Std
- 8.30e+03 ± 1.41e+03
- Median ± IQR
- 8.27e+03 ± 1.63e+03
- Min | Max
- 5.04e+03 | 1.86e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column 1 | Column 2 | Cramér's V | Pearson's Correlation |
|---|---|---|---|
| load_mw_lag_1h | load_mw_lag_2h | 0.819 | 0.982 |
| load_mw_lag_2h | load_mw_lag_3h | 0.808 | 0.981 |
| load_mw_lag_1h | load_mw_lag_3h | 0.701 | 0.940 |
| load_mw | load_mw_lag_1h | 0.700 | 0.981 |
| load_mw_lag_1d | load_mw_rolling_median_24h | 0.636 | 0.889 |
| load_mw_lag_1w | load_mw_rolling_median_7d | 0.617 | 0.831 |
| load_mw | load_mw_lag_1d | 0.603 | 0.932 |
| load_mw | load_mw_lag_2h | 0.596 | 0.940 |
| load_mw_lag_1h | load_mw_lag_1d | 0.596 | 0.917 |
| load_mw_rolling_median_24h | load_mw_rolling_median_7d | 0.565 | 0.918 |
| load_mw_lag_2h | load_mw_lag_1d | 0.520 | 0.880 |
| load_mw_lag_1h | load_mw_rolling_median_24h | 0.506 | 0.889 |
| load_mw_lag_1d | load_mw_rolling_median_7d | 0.505 | 0.853 |
| load_mw | load_mw_lag_1w | 0.502 | 0.884 |
| load_mw_lag_2h | load_mw_rolling_median_24h | 0.499 | 0.890 |
| load_mw | load_mw_lag_3h | 0.498 | 0.894 |
| load_mw_lag_3h | load_mw_rolling_median_24h | 0.494 | 0.893 |
| load_mw_lag_1d | load_mw_lag_1w | 0.492 | 0.853 |
| load_mw_rolling_median_7d | load_mw_iqr_7d | 0.488 | 0.199 |
| load_mw | load_mw_rolling_median_24h | 0.488 | 0.886 |
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Let’s extract the dates where the inter-quartile range of the load is greater than 15,000 MW.
electricity_lagged.filter(pl.col("load_mw_iqr_7d") > 15_000)[
"time"
].dt.date().unique().sort().to_list().skb.eval()
[datetime.date(2021, 12, 26),
datetime.date(2021, 12, 27),
datetime.date(2021, 12, 28),
datetime.date(2022, 1, 7),
datetime.date(2022, 1, 8),
datetime.date(2023, 1, 19),
datetime.date(2023, 1, 20),
datetime.date(2023, 1, 21),
datetime.date(2024, 1, 10),
datetime.date(2024, 1, 11),
datetime.date(2024, 1, 12),
datetime.date(2024, 1, 13)]
We observe 3 date ranges with high inter-quartile range. Let’s plot the electricity load and the lagged features for the first data range along with the weather data for Paris.
altair.Chart(
electricity_lagged.filter(
(pl.col("time") > pl.datetime(2021, 12, 1, time_zone="UTC"))
& (pl.col("time") < pl.datetime(2021, 12, 31, time_zone="UTC"))
).skb.eval()
).transform_fold(
[
"load_mw",
"load_mw_iqr_7d",
],
).mark_line(
tooltip=True
).encode(
x="time:T", y="value:Q", color="key:N"
).interactive()
altair.Chart(
all_city_weather.filter(
(pl.col("time") > pl.datetime(2021, 12, 1, time_zone="UTC"))
& (pl.col("time") < pl.datetime(2021, 12, 31, time_zone="UTC"))
).skb.eval()
).transform_fold(
[f"weather_temperature_2m_{city_name}" for city_name in city_names.skb.eval()],
).mark_line(
tooltip=True
).encode(
x="time:T", y="value:Q", color="key:N"
).interactive()
Based on the plots above, we can see that the electricity load was high just before the Christmas holidays due to low temperatures. Then the load suddenly dropped because temperatures went higher right at the start of the end-of-year holidays.
So those outliers do not seem to be caused to a data quality issue but rather due to a real change in the electricity load demand. We could conduct similar analysis for the other date ranges with high inter-quartile range but we will skip that for now.
If we had observed significant data quality issues over extended periods of
time could have been addressed by removing the corresponding rows from the
dataset. However, this would make the lagged and windowing feature
engineering challenging to reimplement correctly. A better approach would be
to keep a contiguous dataset assign 0 weights to the affected rows when
fitting or evaluating the trained models via the use of the sample_weight
parameter.
Final dataset#
We now assemble the dataset that will be used to train and evaluate the forecasting models via backtesting.
prediction_start_time = skrub.var(
"prediction_start_time", historical_data_start_time.skb.eval() + pl.duration(days=7)
)
prediction_end_time = skrub.var(
"prediction_end_time", historical_data_end_time.skb.eval() - pl.duration(hours=24)
)
@skrub.deferred
def define_prediction_time_range(prediction_start_time, prediction_end_time):
return pl.DataFrame().with_columns(
pl.datetime_range(
start=prediction_start_time,
end=prediction_end_time,
time_zone="UTC",
interval="1h",
).alias("prediction_time"),
)
prediction_time = define_prediction_time_range(
prediction_start_time, prediction_end_time
).skb.subsample(n=1000, how="head")
prediction_time
Show graph
| prediction_time |
|---|
| 2021-03-30 00:00:00+00:00 |
| 2021-03-30 01:00:00+00:00 |
| 2021-03-30 02:00:00+00:00 |
| 2021-03-30 03:00:00+00:00 |
| 2021-03-30 04:00:00+00:00 |
| 2021-05-10 11:00:00+00:00 |
| 2021-05-10 12:00:00+00:00 |
| 2021-05-10 13:00:00+00:00 |
| 2021-05-10 14:00:00+00:00 |
| 2021-05-10 15:00:00+00:00 |
prediction_time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 1,000 (100.0%)
- Min | Max
- 2021-03-30T00:00:00+00:00 | 2021-05-10T15:00:00+00:00
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | prediction_time | Datetime | 0 (0.0%) | 1000 (100.0%) | 2021-03-30T00:00:00+00:00 | 2021-05-10T15:00:00+00:00 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
@skrub.deferred
def build_features(
prediction_time,
electricity_lagged,
all_city_weather,
calendar,
future_feature_horizons=[1, 24],
):
return (
prediction_time.join(
electricity_lagged, left_on="prediction_time", right_on="time"
)
.join(
all_city_weather.select(
[pl.col("time")]
+ [
pl.col(c).shift(-h).alias(c + f"_future_{h}h")
for c in all_city_weather.columns
if c != "time"
for h in future_feature_horizons
]
),
left_on="prediction_time",
right_on="time",
)
.join(
calendar.select(
[pl.col("time")]
+ [
pl.col(c).shift(-h).alias(c + f"_future_{h}h")
for c in calendar.columns
if c != "time"
for h in future_feature_horizons
]
),
left_on="prediction_time",
right_on="time",
)
).drop("prediction_time")
features = build_features(
prediction_time=prediction_time,
electricity_lagged=electricity_lagged,
all_city_weather=all_city_weather,
calendar=calendar,
).skb.mark_as_X()
features
Show graph
| load_mw | load_mw_lag_1h | load_mw_lag_2h | load_mw_lag_3h | load_mw_lag_1d | load_mw_lag_1w | load_mw_rolling_median_24h | load_mw_rolling_median_7d | load_mw_iqr_24h | load_mw_iqr_7d | weather_temperature_2m_paris_future_1h | weather_temperature_2m_paris_future_24h | weather_precipitation_paris_future_1h | weather_precipitation_paris_future_24h | weather_wind_speed_10m_paris_future_1h | weather_wind_speed_10m_paris_future_24h | weather_cloud_cover_paris_future_1h | weather_cloud_cover_paris_future_24h | weather_soil_moisture_1_to_3cm_paris_future_1h | weather_soil_moisture_1_to_3cm_paris_future_24h | weather_relative_humidity_2m_paris_future_1h | weather_relative_humidity_2m_paris_future_24h | weather_temperature_2m_lyon_future_1h | weather_temperature_2m_lyon_future_24h | weather_precipitation_lyon_future_1h | weather_precipitation_lyon_future_24h | weather_wind_speed_10m_lyon_future_1h | weather_wind_speed_10m_lyon_future_24h | weather_cloud_cover_lyon_future_1h | weather_cloud_cover_lyon_future_24h | weather_soil_moisture_1_to_3cm_lyon_future_1h | weather_soil_moisture_1_to_3cm_lyon_future_24h | weather_relative_humidity_2m_lyon_future_1h | weather_relative_humidity_2m_lyon_future_24h | weather_temperature_2m_marseille_future_1h | weather_temperature_2m_marseille_future_24h | weather_precipitation_marseille_future_1h | weather_precipitation_marseille_future_24h | weather_wind_speed_10m_marseille_future_1h | weather_wind_speed_10m_marseille_future_24h | weather_cloud_cover_marseille_future_1h | weather_cloud_cover_marseille_future_24h | weather_soil_moisture_1_to_3cm_marseille_future_1h | weather_soil_moisture_1_to_3cm_marseille_future_24h | weather_relative_humidity_2m_marseille_future_1h | weather_relative_humidity_2m_marseille_future_24h | weather_temperature_2m_toulouse_future_1h | weather_temperature_2m_toulouse_future_24h | weather_precipitation_toulouse_future_1h | weather_precipitation_toulouse_future_24h | weather_wind_speed_10m_toulouse_future_1h | weather_wind_speed_10m_toulouse_future_24h | weather_cloud_cover_toulouse_future_1h | weather_cloud_cover_toulouse_future_24h | weather_soil_moisture_1_to_3cm_toulouse_future_1h | weather_soil_moisture_1_to_3cm_toulouse_future_24h | weather_relative_humidity_2m_toulouse_future_1h | weather_relative_humidity_2m_toulouse_future_24h | weather_temperature_2m_lille_future_1h | weather_temperature_2m_lille_future_24h | weather_precipitation_lille_future_1h | weather_precipitation_lille_future_24h | weather_wind_speed_10m_lille_future_1h | weather_wind_speed_10m_lille_future_24h | weather_cloud_cover_lille_future_1h | weather_cloud_cover_lille_future_24h | weather_soil_moisture_1_to_3cm_lille_future_1h | weather_soil_moisture_1_to_3cm_lille_future_24h | weather_relative_humidity_2m_lille_future_1h | weather_relative_humidity_2m_lille_future_24h | weather_temperature_2m_limoges_future_1h | weather_temperature_2m_limoges_future_24h | weather_precipitation_limoges_future_1h | weather_precipitation_limoges_future_24h | weather_wind_speed_10m_limoges_future_1h | weather_wind_speed_10m_limoges_future_24h | weather_cloud_cover_limoges_future_1h | weather_cloud_cover_limoges_future_24h | weather_soil_moisture_1_to_3cm_limoges_future_1h | weather_soil_moisture_1_to_3cm_limoges_future_24h | weather_relative_humidity_2m_limoges_future_1h | weather_relative_humidity_2m_limoges_future_24h | weather_temperature_2m_nantes_future_1h | weather_temperature_2m_nantes_future_24h | weather_precipitation_nantes_future_1h | weather_precipitation_nantes_future_24h | weather_wind_speed_10m_nantes_future_1h | weather_wind_speed_10m_nantes_future_24h | weather_cloud_cover_nantes_future_1h | weather_cloud_cover_nantes_future_24h | weather_soil_moisture_1_to_3cm_nantes_future_1h | weather_soil_moisture_1_to_3cm_nantes_future_24h | weather_relative_humidity_2m_nantes_future_1h | weather_relative_humidity_2m_nantes_future_24h | weather_temperature_2m_strasbourg_future_1h | weather_temperature_2m_strasbourg_future_24h | weather_precipitation_strasbourg_future_1h | weather_precipitation_strasbourg_future_24h | weather_wind_speed_10m_strasbourg_future_1h | weather_wind_speed_10m_strasbourg_future_24h | weather_cloud_cover_strasbourg_future_1h | weather_cloud_cover_strasbourg_future_24h | weather_soil_moisture_1_to_3cm_strasbourg_future_1h | weather_soil_moisture_1_to_3cm_strasbourg_future_24h | weather_relative_humidity_2m_strasbourg_future_1h | weather_relative_humidity_2m_strasbourg_future_24h | weather_temperature_2m_brest_future_1h | weather_temperature_2m_brest_future_24h | weather_precipitation_brest_future_1h | weather_precipitation_brest_future_24h | weather_wind_speed_10m_brest_future_1h | weather_wind_speed_10m_brest_future_24h | weather_cloud_cover_brest_future_1h | weather_cloud_cover_brest_future_24h | weather_soil_moisture_1_to_3cm_brest_future_1h | weather_soil_moisture_1_to_3cm_brest_future_24h | weather_relative_humidity_2m_brest_future_1h | weather_relative_humidity_2m_brest_future_24h | weather_temperature_2m_bayonne_future_1h | weather_temperature_2m_bayonne_future_24h | weather_precipitation_bayonne_future_1h | weather_precipitation_bayonne_future_24h | weather_wind_speed_10m_bayonne_future_1h | weather_wind_speed_10m_bayonne_future_24h | weather_cloud_cover_bayonne_future_1h | weather_cloud_cover_bayonne_future_24h | weather_soil_moisture_1_to_3cm_bayonne_future_1h | weather_soil_moisture_1_to_3cm_bayonne_future_24h | weather_relative_humidity_2m_bayonne_future_1h | weather_relative_humidity_2m_bayonne_future_24h | cal_hour_of_day_future_1h | cal_hour_of_day_future_24h | cal_day_of_week_future_1h | cal_day_of_week_future_24h | cal_day_of_year_future_1h | cal_day_of_year_future_24h | cal_year_future_1h | cal_year_future_24h | cal_is_holiday_future_1h | cal_is_holiday_future_24h |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46395.0 | 47401.0 | 49217.0 | 51561.0 | 48600.0 | 59823.0 | 51122.5 | 54884.5 | 7834.0 | 8199.0 | 13.06450080871582 | 16.06450080871582 | 0.0 | 0.0 | 5.804825305938721 | 3.9600000381469727 | 100.0 | 0.0 | 64.0 | 66.0 | 10.585000038146973 | 13.135000228881836 | 0.0 | 0.0 | 4.553679466247559 | 5.759999752044678 | 63.0 | 0.0 | 65.0 | 55.0 | 14.328999519348145 | 15.078999519348145 | 0.0 | 0.0 | 1.527350664138794 | 5.191993713378906 | 0.0 | 0.0 | 72.0 | 59.0 | 11.582500457763672 | 12.482500076293945 | 0.0 | 0.0 | 19.40663719177246 | 16.831684112548828 | 0.0 | 6.0 | 67.0 | 54.0 | 10.399999618530273 | 13.149999618530273 | 0.0 | 0.0 | 7.56856632232666 | 8.20926284790039 | 48.0 | 0.0 | 67.0 | 77.0 | 7.401000022888184 | 8.901000022888184 | 0.0 | 0.0 | 4.320000171661377 | 4.452953815460205 | 0.0 | 0.0 | 85.0 | 85.0 | 9.527999877929688 | 11.628000259399414 | 0.0 | 0.0 | 10.972620010375977 | 9.44957160949707 | 0.0 | 0.0 | 86.0 | 81.0 | 10.337000846862793 | 12.53700065612793 | 0.0 | 0.0 | 1.1384198665618896 | 5.411986351013184 | 17.0 | 19.0 | 62.0 | 76.0 | 10.32800006866455 | 9.727999687194824 | 0.0 | 0.0 | 13.532360076904297 | 10.315114974975586 | 11.0 | 7.0 | 83.0 | 76.0 | 12.29800033569336 | 13.89799976348877 | 0.0 | 0.0 | 9.693296432495117 | 8.20926284790039 | 10.0 | 0.0 | 64.0 | 58.0 | 3 | 2 | 2 | 3 | 89 | 90 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 44269.0 | 46395.0 | 47401.0 | 49217.0 | 46722.0 | 59369.0 | 51122.5 | 54857.5 | 7834.0 | 8252.0 | 12.614500045776367 | 15.514500617980957 | 0.0 | 0.0 | 5.804825305938721 | 4.33497428894043 | 100.0 | 0.0 | 65.0 | 68.0 | 10.135000228881836 | 12.6850004196167 | 0.0 | 0.0 | 5.091168403625488 | 5.411986351013184 | 100.0 | 0.0 | 65.0 | 53.0 | 14.178999900817871 | 15.029000282287598 | 0.0 | 0.0 | 1.8356469869613647 | 5.399999618530273 | 0.0 | 0.0 | 72.0 | 59.0 | 11.432499885559082 | 12.232500076293945 | 0.0 | 0.0 | 19.914215087890625 | 16.831684112548828 | 0.0 | 11.0 | 68.0 | 55.0 | 10.050000190734863 | 12.75 | 0.0 | 0.0 | 7.072877883911133 | 7.695919990539551 | 58.0 | 0.0 | 67.0 | 79.0 | 7.151000022888184 | 8.401000022888184 | 0.0 | 0.0 | 4.802999019622803 | 4.802999019622803 | 0.0 | 0.0 | 85.0 | 85.0 | 9.378000259399414 | 11.378000259399414 | 0.0 | 0.0 | 10.703569412231445 | 9.885261535644531 | 6.0 | 5.0 | 84.0 | 80.0 | 9.88700008392334 | 12.03700065612793 | 0.0 | 0.0 | 1.7999999523162842 | 6.119999885559082 | 42.0 | 22.0 | 63.0 | 78.0 | 10.428000450134277 | 9.378000259399414 | 0.0 | 0.0 | 14.489720344543457 | 9.255571365356445 | 17.0 | 7.0 | 82.0 | 79.0 | 11.89799976348877 | 13.597999572753906 | 0.0 | 0.0 | 9.793059349060059 | 8.891343116760254 | 6.0 | 5.0 | 63.0 | 55.0 | 4 | 3 | 2 | 3 | 89 | 90 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 43874.0 | 44269.0 | 46395.0 | 47401.0 | 46329.0 | 57550.0 | 51122.5 | 54603.0 | 7834.0 | 8269.0 | 12.214500427246094 | 14.964500427246094 | 0.0 | 0.0 | 6.119999885559082 | 4.553679466247559 | 100.0 | 0.0 | 66.0 | 71.0 | 9.6850004196167 | 12.234999656677246 | 0.0 | 0.0 | 5.411986351013184 | 5.091168403625488 | 100.0 | 0.0 | 65.0 | 54.0 | 14.029000282287598 | 14.878999710083008 | 0.0 | 0.0 | 2.5199999809265137 | 5.623379707336426 | 0.0 | 0.0 | 73.0 | 59.0 | 11.232500076293945 | 11.982500076293945 | 0.0 | 0.0 | 20.124610900878906 | 17.33989715576172 | 0.0 | 7.0 | 70.0 | 57.0 | 9.800000190734863 | 12.449999809265137 | 0.0 | 0.0 | 7.072877883911133 | 7.594207286834717 | 98.0 | 0.0 | 68.0 | 80.0 | 6.60099983215332 | 7.901000022888184 | 0.0 | 0.0 | 4.802999019622803 | 4.213691711425781 | 0.0 | 0.0 | 85.0 | 85.0 | 9.128000259399414 | 10.928000450134277 | 0.0 | 0.0 | 10.972620010375977 | 10.883675575256348 | 0.0 | 0.0 | 84.0 | 82.0 | 9.487000465393066 | 11.53700065612793 | 0.0 | 0.0 | 3.5999999046325684 | 6.119999885559082 | 18.0 | 20.0 | 63.0 | 81.0 | 10.628000259399414 | 9.128000259399414 | 0.0 | 0.0 | 14.986553192138672 | 8.647496223449707 | 100.0 | 16.0 | 78.0 | 82.0 | 11.697999954223633 | 12.79800033569336 | 0.0 | 0.10000000149011612 | 9.511088371276855 | 9.199390411376953 | 10.0 | 0.0 | 61.0 | 66.0 | 5 | 4 | 2 | 3 | 89 | 90 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 46197.0 | 43874.0 | 44269.0 | 46395.0 | 49199.0 | 57188.0 | 51122.5 | 54325.0 | 8856.0 | 8278.0 | 11.764500617980957 | 14.56450080871582 | 0.0 | 0.0 | 5.315336227416992 | 4.452953815460205 | 57.0 | 0.0 | 68.0 | 72.0 | 9.135000228881836 | 11.734999656677246 | 0.0 | 0.0 | 5.399999618530273 | 5.091168403625488 | 61.0 | 0.0 | 66.0 | 56.0 | 13.979000091552734 | 14.779000282287598 | 0.0 | 0.0 | 3.617955207824707 | 6.130578994750977 | 5.0 | 0.0 | 72.0 | 60.0 | 10.982500076293945 | 11.882500648498535 | 0.0 | 0.0 | 20.150354385375977 | 17.61058807373047 | 0.0 | 5.0 | 71.0 | 57.0 | 9.550000190734863 | 12.149999618530273 | 0.0 | 0.0 | 6.989935398101807 | 7.2805495262146 | 100.0 | 0.0 | 71.0 | 81.0 | 6.200999736785889 | 7.60099983215332 | 0.0 | 0.0 | 4.452953815460205 | 4.213691711425781 | 0.0 | 0.0 | 83.0 | 83.0 | 8.928000450134277 | 10.628000259399414 | 0.0 | 0.0 | 11.18320083618164 | 11.304228782653809 | 6.0 | 5.0 | 85.0 | 83.0 | 8.937000274658203 | 11.13700008392334 | 0.0 | 0.0 | 4.33497428894043 | 6.839999675750732 | 19.0 | 20.0 | 65.0 | 81.0 | 10.428000450134277 | 8.928000450134277 | 0.0 | 0.0 | 15.46324634552002 | 8.640000343322754 | 14.0 | 67.0 | 77.0 | 85.0 | 11.39799976348877 | 12.89799976348877 | 0.0 | 0.0 | 8.714676856994629 | 9.65981388092041 | 13.0 | 100.0 | 61.0 | 66.0 | 6 | 5 | 2 | 3 | 89 | 90 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 51913.0 | 46197.0 | 43874.0 | 44269.0 | 54881.0 | 60367.0 | 51122.5 | 54140.0 | 8856.0 | 8278.0 | 11.264500617980957 | 14.06450080871582 | 0.0 | 0.0 | 5.483356475830078 | 4.379589080810547 | 8.0 | 6.0 | 71.0 | 74.0 | 8.635000228881836 | 11.335000038146973 | 0.0 | 0.0 | 5.447788238525391 | 5.052840709686279 | 12.0 | 0.0 | 68.0 | 57.0 | 13.878999710083008 | 14.678999900817871 | 0.0 | 0.0 | 5.351784706115723 | 4.33497428894043 | 0.0 | 0.0 | 71.0 | 61.0 | 10.882500648498535 | 11.682499885559082 | 0.0 | 0.0 | 19.862083435058594 | 17.581125259399414 | 0.0 | 10.0 | 72.0 | 56.0 | 9.149999618530273 | 11.75 | 0.0 | 0.0 | 7.56856632232666 | 7.594207286834717 | 72.0 | 0.0 | 76.0 | 81.0 | 6.10099983215332 | 7.401000022888184 | 0.0 | 0.0 | 4.213691711425781 | 4.553679466247559 | 0.0 | 0.0 | 80.0 | 78.0 | 8.628000259399414 | 10.227999687194824 | 0.0 | 0.0 | 10.99032211303711 | 10.188699722290039 | 0.0 | 6.0 | 87.0 | 86.0 | 8.53700065612793 | 10.737000465393066 | 0.0 | 0.0 | 3.976329803466797 | 7.235910415649414 | 73.0 | 18.0 | 66.0 | 81.0 | 10.527999877929688 | 8.628000259399414 | 0.0 | 0.0 | 16.74677276611328 | 9.422101020812988 | 19.0 | 59.0 | 76.0 | 89.0 | 11.597999572753906 | 12.697999954223633 | 0.0 | 0.0 | 8.20926284790039 | 8.891343116760254 | 76.0 | 51.0 | 60.0 | 64.0 | 7 | 6 | 2 | 3 | 89 | 90 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 51473.0 | 52605.0 | 51206.0 | 50343.0 | 42573.0 | 54051.0 | 41932.0 | 48895.5 | 10258.0 | 9724.0 | 18.964500427246094 | 13.864500045776367 | 0.0 | 0.0 | 20.873790740966797 | 9.779816627502441 | 7.0 | 100.0 | 40.0 | 65.0 | 12.53499984741211 | 13.4350004196167 | 4.199999809265137 | 0.0 | 15.379205703735352 | 3.2599384784698486 | 100.0 | 100.0 | 98.0 | 77.0 | 16.979000091552734 | 16.07900047302246 | 2.700000047683716 | 0.0 | 31.73109245300293 | 12.101570129394531 | 100.0 | 100.0 | 82.0 | 77.0 | 16.682498931884766 | 10.382500648498535 | 0.0 | 0.10000000149011612 | 7.4215898513793945 | 17.015474319458008 | 100.0 | 100.0 | 54.0 | 85.0 | 17.149999618530273 | 17.450000762939453 | 0.0 | 0.800000011920929 | 29.30409049987793 | 7.2894439697265625 | 75.0 | 67.0 | 51.0 | 57.0 | 18.10099983215332 | 10.401000022888184 | 0.0 | 0.4000000059604645 | 8.049844741821289 | 10.182336807250977 | 100.0 | 100.0 | 40.0 | 93.0 | 16.42799949645996 | 14.928000450134277 | 0.0 | 0.10000000149011612 | 30.07670021057129 | 32.3599739074707 | 12.0 | 5.0 | 52.0 | 51.0 | 14.937000274658203 | 11.937000274658203 | 0.6000000238418579 | 2.799999952316284 | 6.4096174240112305 | 7.559999465942383 | 100.0 | 100.0 | 91.0 | 97.0 | 13.027999877929688 | 12.32800006866455 | 0.30000001192092896 | 0.20000000298023224 | 34.63678741455078 | 37.06427001953125 | 13.0 | 14.0 | 72.0 | 76.0 | 18.097999572753906 | 13.998000144958496 | 0.0 | 0.4000000059604645 | 16.548255920410156 | 28.639802932739258 | 6.0 | 7.0 | 51.0 | 63.0 | 14 | 13 | 1 | 2 | 130 | 131 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 50153.0 | 51473.0 | 52605.0 | 51206.0 | 40236.0 | 51767.0 | 42471.0 | 48895.5 | 11463.0 | 9724.0 | 19.56450080871582 | 15.56450080871582 | 0.0 | 0.30000001192092896 | 20.633371353149414 | 13.441070556640625 | 36.0 | 100.0 | 38.0 | 61.0 | 10.734999656677246 | 14.234999656677246 | 7.599999904632568 | 0.10000000149011612 | 12.904882431030273 | 13.708390235900879 | 100.0 | 100.0 | 98.0 | 75.0 | 16.87900161743164 | 17.57900047302246 | 0.8999999761581421 | 0.4000000059604645 | 35.670894622802734 | 26.20839500427246 | 100.0 | 7.0 | 83.0 | 67.0 | 18.58249855041504 | 12.682499885559082 | 0.0 | 0.4000000059604645 | 9.470120429992676 | 19.82943344116211 | 100.0 | 100.0 | 44.0 | 76.0 | 17.25 | 17.049999237060547 | 0.0 | 1.0 | 25.81227684020996 | 7.9932966232299805 | 69.0 | 100.0 | 51.0 | 58.0 | 18.201000213623047 | 10.701000213623047 | 0.0 | 0.5 | 9.36691951751709 | 11.92898941040039 | 100.0 | 100.0 | 40.0 | 90.0 | 16.527999877929688 | 14.82800006866455 | 0.0 | 0.10000000149011612 | 30.106849670410156 | 35.6563606262207 | 24.0 | 24.0 | 51.0 | 55.0 | 14.337000846862793 | 11.837000846862793 | 1.2999999523162842 | 2.5999999046325684 | 5.937272071838379 | 5.506940841674805 | 100.0 | 100.0 | 92.0 | 97.0 | 13.027999877929688 | 11.727999687194824 | 0.4000000059604645 | 0.4000000059604645 | 33.847660064697266 | 37.99833679199219 | 12.0 | 84.0 | 71.0 | 84.0 | 18.097999572753906 | 14.89799976348877 | 0.0 | 0.10000000149011612 | 13.32486343383789 | 25.76453399658203 | 12.0 | -1.0 | 49.0 | 57.0 | 15 | 14 | 1 | 2 | 130 | 131 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 48774.0 | 50153.0 | 51473.0 | 52605.0 | 38584.0 | 49057.0 | 42559.5 | 48841.5 | 10531.0 | 9724.0 | 19.364500045776367 | 14.764500617980957 | 0.0 | 0.4000000059604645 | 19.134262084960938 | 21.325252532958984 | 100.0 | 100.0 | 37.0 | 62.0 | 10.835000038146973 | 13.635000228881836 | 6.300000190734863 | 0.6000000238418579 | 7.993297576904297 | 22.08652114868164 | 100.0 | 100.0 | 98.0 | 81.0 | 15.878999710083008 | 17.57900047302246 | 1.600000023841858 | 0.6000000238418579 | 37.163787841796875 | 29.070974349975586 | 100.0 | 54.0 | 93.0 | 66.0 | 19.08249855041504 | 12.182499885559082 | 0.0 | 0.5 | 8.66994857788086 | 19.914216995239258 | 100.0 | 100.0 | 43.0 | 78.0 | 17.350000381469727 | 16.850000381469727 | 0.0 | 0.5 | 22.453149795532227 | 8.766572952270508 | 60.0 | 100.0 | 52.0 | 58.0 | 18.10099983215332 | 10.901000022888184 | 0.0 | 0.6000000238418579 | 11.874544143676758 | 12.371644973754883 | 100.0 | 100.0 | 45.0 | 90.0 | 16.42799949645996 | 12.727999687194824 | 0.0 | 0.4000000059604645 | 29.037519454956055 | 33.22158432006836 | 50.0 | 99.0 | 51.0 | 73.0 | 14.13700008392334 | 12.63700008392334 | 0.5 | 0.10000000149011612 | 5.052840709686279 | 4.33497428894043 | 100.0 | 100.0 | 86.0 | 93.0 | 12.928000450134277 | 12.027999877929688 | 0.30000001192092896 | 0.4000000059604645 | 31.104082107543945 | 38.84832000732422 | 7.0 | 77.0 | 68.0 | 79.0 | 18.197999954223633 | 14.89799976348877 | 0.0 | 0.0 | 13.551501274108887 | 23.34454917907715 | 12.0 | 0.0 | 49.0 | 57.0 | 16 | 15 | 1 | 2 | 130 | 131 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 47256.0 | 48774.0 | 50153.0 | 51473.0 | 37645.0 | 47439.0 | 42762.0 | 48841.5 | 10227.0 | 9724.0 | 19.26449966430664 | 13.864500045776367 | 0.0 | 0.0 | 17.102840423583984 | 11.165804862976074 | 96.0 | 100.0 | 38.0 | 60.0 | 10.835000038146973 | 11.03499984741211 | 4.5 | 1.399999976158142 | 11.165804862976074 | 14.777549743652344 | 100.0 | 100.0 | 98.0 | 92.0 | 15.779000282287598 | 17.979000091552734 | 1.2000000476837158 | 0.0 | 36.6015739440918 | 31.630870819091797 | 100.0 | 13.0 | 95.0 | 61.0 | 19.08249855041504 | 13.382500648498535 | 0.0 | 0.5 | 7.342587947845459 | 24.743289947509766 | 100.0 | 96.0 | 46.0 | 73.0 | 16.549999237060547 | 16.850000381469727 | 0.10000000149011612 | 0.5 | 19.376562118530273 | 11.252518653869629 | 100.0 | 100.0 | 57.0 | 58.0 | 17.000999450683594 | 11.60099983215332 | 0.0 | 0.6000000238418579 | 10.245779991149902 | 14.113653182983398 | 100.0 | 100.0 | 49.0 | 83.0 | 16.12799835205078 | 13.82800006866455 | 0.0 | 0.30000001192092896 | 28.167556762695312 | 34.22840881347656 | 19.0 | 29.0 | 52.0 | 68.0 | 13.337000846862793 | 13.53700065612793 | 0.4000000059604645 | 0.0 | 4.33497428894043 | 7.127635955810547 | 100.0 | 100.0 | 90.0 | 90.0 | 13.027999877929688 | 12.32800006866455 | 0.10000000149011612 | 0.30000001192092896 | 28.673721313476562 | 37.462493896484375 | 11.0 | 19.0 | 64.0 | 73.0 | 17.898000717163086 | 14.89799976348877 | 0.0 | 0.0 | 13.392773628234863 | 19.486608505249023 | 15.0 | 0.0 | 49.0 | 57.0 | 17 | 16 | 1 | 2 | 130 | 131 | 2021 | 2021 | False | False | ||||||||||||||||||||
| 46422.0 | 47256.0 | 48774.0 | 50153.0 | 37431.0 | 46776.0 | 43217.0 | 48841.5 | 8677.0 | 9724.0 | 17.864500045776367 | 17.26449966430664 | 0.0 | 0.0 | 16.70415496826172 | 21.6599178314209 | 100.0 | 73.0 | 45.0 | 42.0 | 10.9350004196167 | 12.03499984741211 | 4.800000190734863 | 0.4000000059604645 | 11.50311279296875 | 15.778515815734863 | 100.0 | 100.0 | 98.0 | 88.0 | 15.779000282287598 | 18.37900161743164 | 0.699999988079071 | 0.0 | 36.09527587890625 | 29.96661949157715 | 100.0 | 6.0 | 95.0 | 56.0 | 17.38249969482422 | 13.582500457763672 | 0.30000001192092896 | 0.20000000298023224 | 6.193674087524414 | 21.6599178314209 | 100.0 | 100.0 | 63.0 | 65.0 | 17.049999237060547 | 16.350000381469727 | 0.10000000149011612 | 0.6000000238418579 | 21.203357696533203 | 14.058449745178223 | 100.0 | 100.0 | 55.0 | 61.0 | 15.10099983215332 | 13.000999450683594 | 0.20000000298023224 | 0.30000001192092896 | 9.793058395385742 | 12.015588760375977 | 100.0 | 100.0 | 67.0 | 61.0 | 15.027999877929688 | 12.928000450134277 | 0.0 | 0.20000000298023224 | 26.92549705505371 | 29.686359405517578 | 100.0 | 100.0 | 57.0 | 67.0 | 13.237000465393066 | 14.13700008392334 | 0.20000000298023224 | 0.20000000298023224 | 1.0182337760925293 | 7.4215898513793945 | 100.0 | 100.0 | 90.0 | 87.0 | 12.527999877929688 | 12.128000259399414 | 0.10000000149011612 | 0.20000000298023224 | 25.50162124633789 | 37.57992935180664 | 10.0 | 15.0 | 67.0 | 73.0 | 17.79800033569336 | 14.597999572753906 | 0.0 | 0.0 | 10.948972702026367 | 15.696165084838867 | 17.0 | 5.0 | 49.0 | 57.0 | 18 | 17 | 1 | 2 | 130 | 131 | 2021 | 2021 | False | False |
load_mw
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.51e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_lag_1h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.48e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_lag_2h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.48e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_lag_3h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.48e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_lag_1d
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.10e+04 ± 6.29e+03
- Median ± IQR
- 5.08e+04 ± 8.27e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_lag_1w
Float64- Null values
- 0 (0.0%)
- Unique values
- 967 (96.7%)
- Mean ± Std
- 5.20e+04 ± 6.28e+03
- Median ± IQR
- 5.16e+04 ± 8.39e+03
- Min | Max
- 3.71e+04 | 7.04e+04
load_mw_rolling_median_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 319 (31.9%)
- Mean ± Std
- 5.08e+04 ± 4.96e+03
- Median ± IQR
- 5.04e+04 ± 5.76e+03
- Min | Max
- 3.97e+04 | 6.15e+04
load_mw_rolling_median_7d
Float64- Null values
- 0 (0.0%)
- Unique values
- 340 (34.0%)
- Mean ± Std
- 5.13e+04 ± 2.96e+03
- Median ± IQR
- 4.98e+04 ± 5.04e+03
- Min | Max
- 4.73e+04 | 5.62e+04
load_mw_iqr_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 404 (40.4%)
- Mean ± Std
- 5.95e+03 ± 1.46e+03
- Median ± IQR
- 5.60e+03 ± 1.70e+03
- Min | Max
- 3.17e+03 | 1.47e+04
load_mw_iqr_7d
Float64- Null values
- 0 (0.0%)
- Unique values
- 515 (51.5%)
- Mean ± Std
- 7.31e+03 ± 1.82e+03
- Median ± IQR
- 6.79e+03 ± 1.92e+03
- Min | Max
- 5.08e+03 | 1.29e+04
weather_temperature_2m_paris_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 219 (21.9%)
- Mean ± Std
- 10.8 ± 4.79
- Median ± IQR
- 10.4 ± 6.40
- Min | Max
- 1.16 | 26.6
weather_temperature_2m_paris_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 215 (21.5%)
- Mean ± Std
- 10.7 ± 4.67
- Median ± IQR
- 10.4 ± 6.30
- Min | Max
- 1.16 | 26.6
weather_precipitation_paris_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 18 (1.8%)
- Mean ± Std
- 0.0510 ± 0.242
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 3.00
weather_precipitation_paris_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 18 (1.8%)
- Mean ± Std
- 0.0517 ± 0.242
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 3.00
weather_wind_speed_10m_paris_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 617 (61.7%)
- Mean ± Std
- 12.2 ± 5.65
- Median ± IQR
- 11.4 ± 8.67
- Min | Max
- 1.08 | 29.9
weather_wind_speed_10m_paris_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 621 (62.1%)
- Mean ± Std
- 12.3 ± 5.65
- Median ± IQR
- 11.5 ± 8.39
- Min | Max
- 1.08 | 29.9
weather_cloud_cover_paris_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 82 (8.2%)
- Mean ± Std
- 52.8 ± 45.0
- Median ± IQR
- 62.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_paris_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 82 (8.2%)
- Mean ± Std
- 54.5 ± 44.9
- Median ± IQR
- 72.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_paris_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_paris_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_paris_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 73 (7.3%)
- Mean ± Std
- 56.4 ± 17.2
- Median ± IQR
- 56.0 ± 25.0
- Min | Max
- 24.0 | 96.0
weather_relative_humidity_2m_paris_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 73 (7.3%)
- Mean ± Std
- 56.8 ± 17.2
- Median ± IQR
- 57.0 ± 26.0
- Min | Max
- 24.0 | 96.0
weather_temperature_2m_lyon_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 249 (24.9%)
- Mean ± Std
- 11.3 ± 5.23
- Median ± IQR
- 10.8 ± 6.80
- Min | Max
- -0.465 | 24.6
weather_temperature_2m_lyon_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 245 (24.5%)
- Mean ± Std
- 11.2 ± 5.14
- Median ± IQR
- 10.8 ± 6.60
- Min | Max
- -0.465 | 24.6
weather_precipitation_lyon_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 32 (3.2%)
- Mean ± Std
- 0.131 ± 0.584
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 7.60
weather_precipitation_lyon_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 35 (3.5%)
- Mean ± Std
- 0.155 ± 0.630
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 7.60
weather_wind_speed_10m_lyon_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 571 (57.1%)
- Mean ± Std
- 10.6 ± 6.92
- Median ± IQR
- 8.50 ± 9.23
- Min | Max
- 0.00 | 39.2
weather_wind_speed_10m_lyon_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 574 (57.4%)
- Mean ± Std
- 10.6 ± 6.95
- Median ± IQR
- 8.40 ± 9.35
- Min | Max
- 0.00 | 39.2
weather_cloud_cover_lyon_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 82 (8.2%)
- Mean ± Std
- 54.0 ± 46.0
- Median ± IQR
- 71.0 ± 95.0
- Min | Max
- 0.00 | 101.
weather_cloud_cover_lyon_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 81 (8.1%)
- Mean ± Std
- 55.9 ± 45.9
- Median ± IQR
- 83.0 ± 94.0
- Min | Max
- 0.00 | 101.
weather_soil_moisture_1_to_3cm_lyon_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_lyon_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_lyon_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 77 (7.7%)
- Mean ± Std
- 63.5 ± 19.4
- Median ± IQR
- 62.0 ± 31.0
- Min | Max
- 22.0 | 98.0
weather_relative_humidity_2m_lyon_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 77 (7.7%)
- Mean ± Std
- 64.4 ± 19.8
- Median ± IQR
- 63.0 ± 33.0
- Min | Max
- 22.0 | 98.0
weather_temperature_2m_marseille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 161 (16.1%)
- Mean ± Std
- 14.0 ± 2.78
- Median ± IQR
- 14.5 ± 3.60
- Min | Max
- 4.78 | 21.3
weather_temperature_2m_marseille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 160 (16.0%)
- Mean ± Std
- 14.0 ± 2.79
- Median ± IQR
- 14.6 ± 3.70
- Min | Max
- 4.78 | 21.3
weather_precipitation_marseille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 29 (2.9%)
- Mean ± Std
- 0.0983 ± 0.493
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 6.10
weather_precipitation_marseille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 29 (2.9%)
- Mean ± Std
- 0.114 ± 0.505
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 6.10
weather_wind_speed_10m_marseille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 734 (73.4%)
- Mean ± Std
- 17.5 ± 14.2
- Median ± IQR
- 12.7 ± 13.8
- Min | Max
- 0.805 | 68.6
weather_wind_speed_10m_marseille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 742 (74.2%)
- Mean ± Std
- 17.9 ± 14.2
- Median ± IQR
- 13.1 ± 14.7
- Min | Max
- 0.805 | 68.6
weather_cloud_cover_marseille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 85 (8.5%)
- Mean ± Std
- 51.9 ± 45.7
- Median ± IQR
- 60.0 ± 95.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_marseille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 85 (8.5%)
- Mean ± Std
- 53.3 ± 45.5
- Median ± IQR
- 66.0 ± 95.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_marseille_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_marseille_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_marseille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 69 (6.9%)
- Mean ± Std
- 62.6 ± 14.0
- Median ± IQR
- 61.0 ± 18.0
- Min | Max
- 27.0 | 95.0
weather_relative_humidity_2m_marseille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 69 (6.9%)
- Mean ± Std
- 63.1 ± 14.4
- Median ± IQR
- 61.0 ± 19.0
- Min | Max
- 27.0 | 95.0
weather_temperature_2m_toulouse_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 223 (22.3%)
- Mean ± Std
- 12.4 ± 4.39
- Median ± IQR
- 12.1 ± 5.50
- Min | Max
- 1.48 | 27.2
weather_temperature_2m_toulouse_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 221 (22.1%)
- Mean ± Std
- 12.3 ± 4.35
- Median ± IQR
- 12.1 ± 5.50
- Min | Max
- 1.48 | 27.2
weather_precipitation_toulouse_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 16 (1.6%)
- Mean ± Std
- 0.0487 ± 0.192
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 2.20
weather_precipitation_toulouse_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 18 (1.8%)
- Mean ± Std
- 0.0641 ± 0.228
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 2.20
weather_wind_speed_10m_toulouse_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 668 (66.8%)
- Mean ± Std
- 13.1 ± 6.86
- Median ± IQR
- 12.3 ± 10.0
- Min | Max
- 0.360 | 39.5
weather_wind_speed_10m_toulouse_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 669 (66.9%)
- Mean ± Std
- 13.0 ± 6.86
- Median ± IQR
- 12.1 ± 10.0
- Min | Max
- 0.360 | 39.5
weather_cloud_cover_toulouse_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 84 (8.4%)
- Mean ± Std
- 59.9 ± 44.6
- Median ± IQR
- 92.0 ± 94.0
- Min | Max
- 0.00 | 101.
weather_cloud_cover_toulouse_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 85 (8.5%)
- Mean ± Std
- 61.8 ± 44.1
- Median ± IQR
- 98.0 ± 93.0
- Min | Max
- 0.00 | 101.
weather_soil_moisture_1_to_3cm_toulouse_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_toulouse_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_toulouse_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 76 (7.6%)
- Mean ± Std
- 62.4 ± 19.4
- Median ± IQR
- 60.0 ± 32.0
- Min | Max
- 22.0 | 98.0
weather_relative_humidity_2m_toulouse_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 76 (7.6%)
- Mean ± Std
- 63.1 ± 19.7
- Median ± IQR
- 61.0 ± 34.0
- Min | Max
- 22.0 | 98.0
weather_temperature_2m_lille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 226 (22.6%)
- Mean ± Std
- 8.43 ± 4.71
- Median ± IQR
- 7.65 ± 6.50
- Min | Max
- -0.850 | 24.5
weather_temperature_2m_lille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 220 (22.0%)
- Mean ± Std
- 8.39 ± 4.63
- Median ± IQR
- 7.65 ± 6.50
- Min | Max
- -0.850 | 24.5
weather_precipitation_lille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 17 (1.7%)
- Mean ± Std
- 0.0482 ± 0.249
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 5.80
weather_precipitation_lille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 18 (1.8%)
- Mean ± Std
- 0.0522 ± 0.253
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 5.80
weather_wind_speed_10m_lille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 692 (69.2%)
- Mean ± Std
- 14.1 ± 7.04
- Median ± IQR
- 13.6 ± 9.68
- Min | Max
- 0.720 | 43.5
weather_wind_speed_10m_lille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 693 (69.3%)
- Mean ± Std
- 14.1 ± 7.08
- Median ± IQR
- 13.6 ± 9.75
- Min | Max
- 0.720 | 43.5
weather_cloud_cover_lille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 49.8 ± 43.6
- Median ± IQR
- 32.0 ± 93.0
- Min | Max
- 0.00 | 101.
weather_cloud_cover_lille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 50.9 ± 43.7
- Median ± IQR
- 36.0 ± 92.0
- Min | Max
- 0.00 | 101.
weather_soil_moisture_1_to_3cm_lille_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_lille_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_lille_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 69 (6.9%)
- Mean ± Std
- 64.2 ± 16.5
- Median ± IQR
- 65.0 ± 26.0
- Min | Max
- 29.0 | 97.0
weather_relative_humidity_2m_lille_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 69 (6.9%)
- Mean ± Std
- 64.5 ± 16.5
- Median ± IQR
- 65.0 ± 27.0
- Min | Max
- 29.0 | 97.0
weather_temperature_2m_limoges_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 279 (27.9%)
- Mean ± Std
- 9.89 ± 5.99
- Median ± IQR
- 9.60 ± 7.50
- Min | Max
- -4.30 | 25.7
weather_temperature_2m_limoges_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 277 (27.7%)
- Mean ± Std
- 9.79 ± 5.90
- Median ± IQR
- 9.50 ± 7.40
- Min | Max
- -4.30 | 25.7
weather_precipitation_limoges_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 25 (2.5%)
- Mean ± Std
- 0.0960 ± 0.480
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 6.70
weather_precipitation_limoges_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 25 (2.5%)
- Mean ± Std
- 0.103 ± 0.483
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 6.70
weather_wind_speed_10m_limoges_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 466 (46.6%)
- Mean ± Std
- 7.90 ± 4.39
- Median ± IQR
- 6.62 ± 6.50
- Min | Max
- 0.00 | 21.6
weather_wind_speed_10m_limoges_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 474 (47.4%)
- Mean ± Std
- 7.94 ± 4.40
- Median ± IQR
- 6.83 ± 6.42
- Min | Max
- 0.00 | 21.6
weather_cloud_cover_limoges_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 83 (8.3%)
- Mean ± Std
- 54.4 ± 45.2
- Median ± IQR
- 69.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_limoges_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 83 (8.3%)
- Mean ± Std
- 56.2 ± 45.0
- Median ± IQR
- 79.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_limoges_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_limoges_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_limoges_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 81 (8.1%)
- Mean ± Std
- 68.6 ± 23.4
- Median ± IQR
- 71.0 ± 42.0
- Min | Max
- 20.0 | 100.
weather_relative_humidity_2m_limoges_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 81 (8.1%)
- Mean ± Std
- 69.4 ± 23.5
- Median ± IQR
- 72.0 ± 42.0
- Min | Max
- 20.0 | 100.
weather_temperature_2m_nantes_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 245 (24.5%)
- Mean ± Std
- 9.76 ± 5.31
- Median ± IQR
- 10.2 ± 7.40
- Min | Max
- -2.47 | 22.7
weather_temperature_2m_nantes_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 243 (24.3%)
- Mean ± Std
- 9.67 ± 5.22
- Median ± IQR
- 10.1 ± 7.20
- Min | Max
- -2.47 | 22.7
weather_precipitation_nantes_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 14 (1.4%)
- Mean ± Std
- 0.0275 ± 0.139
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 2.20
weather_precipitation_nantes_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 15 (1.5%)
- Mean ± Std
- 0.0328 ± 0.157
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 2.20
weather_wind_speed_10m_nantes_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 713 (71.3%)
- Mean ± Std
- 16.0 ± 7.37
- Median ± IQR
- 14.3 ± 10.0
- Min | Max
- 1.48 | 42.3
weather_wind_speed_10m_nantes_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 720 (72.0%)
- Mean ± Std
- 16.1 ± 7.48
- Median ± IQR
- 14.5 ± 10.3
- Min | Max
- 1.48 | 42.3
weather_cloud_cover_nantes_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 49.7 ± 45.0
- Median ± IQR
- 32.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_nantes_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 51.2 ± 44.9
- Median ± IQR
- 47.0 ± 94.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_nantes_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_nantes_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_nantes_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 59 (5.9%)
- Mean ± Std
- 72.8 ± 15.7
- Median ± IQR
- 74.0 ± 28.0
- Min | Max
- 39.0 | 98.0
weather_relative_humidity_2m_nantes_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 59 (5.9%)
- Mean ± Std
- 72.9 ± 15.8
- Median ± IQR
- 74.0 ± 29.0
- Min | Max
- 39.0 | 98.0
weather_temperature_2m_strasbourg_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 236 (23.6%)
- Mean ± Std
- 9.53 ± 4.83
- Median ± IQR
- 9.04 ± 6.20
- Min | Max
- -0.563 | 26.2
weather_temperature_2m_strasbourg_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 230 (23.0%)
- Mean ± Std
- 9.46 ± 4.74
- Median ± IQR
- 9.04 ± 6.00
- Min | Max
- -0.563 | 26.2
weather_precipitation_strasbourg_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 21 (2.1%)
- Mean ± Std
- 0.0675 ± 0.246
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 2.80
weather_precipitation_strasbourg_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 25 (2.5%)
- Mean ± Std
- 0.0925 ± 0.349
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 4.30
weather_wind_speed_10m_strasbourg_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 556 (55.6%)
- Mean ± Std
- 10.3 ± 5.50
- Median ± IQR
- 9.69 ± 7.95
- Min | Max
- 0.509 | 29.9
weather_wind_speed_10m_strasbourg_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 558 (55.8%)
- Mean ± Std
- 10.3 ± 5.45
- Median ± IQR
- 9.69 ± 7.92
- Min | Max
- 0.509 | 29.9
weather_cloud_cover_strasbourg_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 85 (8.5%)
- Mean ± Std
- 64.0 ± 42.9
- Median ± IQR
- 99.0 ± 89.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_strasbourg_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 84 (8.4%)
- Mean ± Std
- 65.3 ± 42.8
- Median ± IQR
- 100. ± 89.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_strasbourg_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_strasbourg_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_strasbourg_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 70 (7.0%)
- Mean ± Std
- 65.1 ± 17.3
- Median ± IQR
- 66.0 ± 26.0
- Min | Max
- 29.0 | 98.0
weather_relative_humidity_2m_strasbourg_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 71 (7.1%)
- Mean ± Std
- 66.0 ± 17.8
- Median ± IQR
- 67.0 ± 28.0
- Min | Max
- 29.0 | 99.0
weather_temperature_2m_brest_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 203 (20.3%)
- Mean ± Std
- 8.99 ± 4.30
- Median ± IQR
- 9.23 ± 6.10
- Min | Max
- 0.928 | 23.2
weather_temperature_2m_brest_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 199 (19.9%)
- Mean ± Std
- 8.89 ± 4.21
- Median ± IQR
- 9.03 ± 6.00
- Min | Max
- 0.928 | 23.2
weather_precipitation_brest_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 12 (1.2%)
- Mean ± Std
- 0.0332 ± 0.118
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 1.40
weather_precipitation_brest_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 13 (1.3%)
- Mean ± Std
- 0.0387 ± 0.131
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 1.50
weather_wind_speed_10m_brest_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 779 (77.9%)
- Mean ± Std
- 18.0 ± 8.97
- Median ± IQR
- 16.7 ± 12.7
- Min | Max
- 0.805 | 44.9
weather_wind_speed_10m_brest_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 786 (78.6%)
- Mean ± Std
- 18.2 ± 9.11
- Median ± IQR
- 17.0 ± 13.0
- Min | Max
- 0.805 | 44.9
weather_cloud_cover_brest_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 91 (9.1%)
- Mean ± Std
- 55.2 ± 43.4
- Median ± IQR
- 66.0 ± 93.0
- Min | Max
- 0.00 | 100.
weather_cloud_cover_brest_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 91 (9.1%)
- Mean ± Std
- 55.5 ± 43.4
- Median ± IQR
- 68.0 ± 93.0
- Min | Max
- 0.00 | 100.
weather_soil_moisture_1_to_3cm_brest_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_brest_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_brest_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 57 (5.7%)
- Mean ± Std
- 72.6 ± 13.6
- Median ± IQR
- 74.0 ± 23.0
- Min | Max
- 43.0 | 99.0
weather_relative_humidity_2m_brest_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 57 (5.7%)
- Mean ± Std
- 72.9 ± 13.6
- Median ± IQR
- 75.0 ± 22.0
- Min | Max
- 43.0 | 99.0
weather_temperature_2m_bayonne_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 247 (24.7%)
- Mean ± Std
- 12.2 ± 5.01
- Median ± IQR
- 12.1 ± 5.00
- Min | Max
- -0.202 | 29.4
weather_temperature_2m_bayonne_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 241 (24.1%)
- Mean ± Std
- 12.1 ± 4.89
- Median ± IQR
- 12.0 ± 4.90
- Min | Max
- -0.202 | 29.4
weather_precipitation_bayonne_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 26 (2.6%)
- Mean ± Std
- 0.0919 ± 0.420
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 4.40
weather_precipitation_bayonne_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 29 (2.9%)
- Mean ± Std
- 0.109 ± 0.468
- Median ± IQR
- 0.00 ± 0.00
- Min | Max
- 0.00 | 5.40
weather_wind_speed_10m_bayonne_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 537 (53.7%)
- Mean ± Std
- 10.1 ± 5.11
- Median ± IQR
- 9.00 ± 6.83
- Min | Max
- 0.509 | 33.3
weather_wind_speed_10m_bayonne_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 548 (54.8%)
- Mean ± Std
- 10.4 ± 5.45
- Median ± IQR
- 9.11 ± 7.36
- Min | Max
- 0.509 | 33.3
weather_cloud_cover_bayonne_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 61.4 ± 44.2
- Median ± IQR
- 97.0 ± 93.0
- Min | Max
- -1.00 | 100.
weather_cloud_cover_bayonne_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 87 (8.7%)
- Mean ± Std
- 62.4 ± 44.1
- Median ± IQR
- 100. ± 93.0
- Min | Max
- -1.00 | 100.
weather_soil_moisture_1_to_3cm_bayonne_future_1h
Float32- Null values
- 1,000 (100.0%)
weather_soil_moisture_1_to_3cm_bayonne_future_24h
Float32- Null values
- 1,000 (100.0%)
weather_relative_humidity_2m_bayonne_future_1h
Float32- Null values
- 0 (0.0%)
- Unique values
- 70 (7.0%)
- Mean ± Std
- 70.5 ± 16.1
- Median ± IQR
- 72.0 ± 25.0
- Min | Max
- 25.0 | 98.0
weather_relative_humidity_2m_bayonne_future_24h
Float32- Null values
- 0 (0.0%)
- Unique values
- 69 (6.9%)
- Mean ± Std
- 71.0 ± 15.9
- Median ± IQR
- 72.0 ± 25.0
- Min | Max
- 25.0 | 98.0
cal_hour_of_day_future_1h
Int8- Null values
- 0 (0.0%)
- Unique values
- 24 (2.4%)
- Mean ± Std
- 11.5 ± 6.90
- Median ± IQR
- 11.0 ± 11.0
- Min | Max
- 0.00 | 23.0
cal_hour_of_day_future_24h
Int8- Null values
- 0 (0.0%)
- Unique values
- 24 (2.4%)
- Mean ± Std
- 11.5 ± 6.90
- Median ± IQR
- 11.0 ± 11.0
- Min | Max
- 0.00 | 23.0
cal_day_of_week_future_1h
Int8- Null values
- 0 (0.0%)
- Unique values
- 7 (0.7%)
- Mean ± Std
- 4.02 ± 1.99
- Median ± IQR
- 4.00 ± 4.00
- Min | Max
- 1.00 | 7.00
cal_day_of_week_future_24h
Int8- Null values
- 0 (0.0%)
- Unique values
- 7 (0.7%)
- Mean ± Std
- 4.01 ± 2.00
- Median ± IQR
- 4.00 ± 4.00
- Min | Max
- 1.00 | 7.00
cal_day_of_year_future_1h
Int16- Null values
- 0 (0.0%)
- Unique values
- 42 (4.2%)
- Mean ± Std
- 109. ± 12.0
- Median ± IQR
- 109. ± 21.0
- Min | Max
- 89.0 | 130.
cal_day_of_year_future_24h
Int16- Null values
- 0 (0.0%)
- Unique values
- 42 (4.2%)
- Mean ± Std
- 110. ± 12.0
- Median ± IQR
- 110. ± 21.0
- Min | Max
- 90.0 | 131.
cal_year_future_1h
Int32- Null values
- 0 (0.0%)
cal_year_future_24h
Int32- Null values
- 0 (0.0%)
cal_is_holiday_future_1h
Boolean- Null values
- 0 (0.0%)
- Unique values
- 2 (0.2%)
cal_is_holiday_future_24h
Boolean- Null values
- 0 (0.0%)
- Unique values
- 2 (0.2%)
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | load_mw | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 1 | load_mw_lag_1h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 2 | load_mw_lag_2h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 3 | load_mw_lag_3h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 4 | load_mw_lag_1d | Float64 | 0 (0.0%) | 966 (96.6%) | 5.10e+04 | 6.29e+03 | 3.35e+04 | 5.08e+04 | 6.95e+04 |
| 5 | load_mw_lag_1w | Float64 | 0 (0.0%) | 967 (96.7%) | 5.20e+04 | 6.28e+03 | 3.71e+04 | 5.16e+04 | 7.04e+04 |
| 6 | load_mw_rolling_median_24h | Float64 | 0 (0.0%) | 319 (31.9%) | 5.08e+04 | 4.96e+03 | 3.97e+04 | 5.04e+04 | 6.15e+04 |
| 7 | load_mw_rolling_median_7d | Float64 | 0 (0.0%) | 340 (34.0%) | 5.13e+04 | 2.96e+03 | 4.73e+04 | 4.98e+04 | 5.62e+04 |
| 8 | load_mw_iqr_24h | Float64 | 0 (0.0%) | 404 (40.4%) | 5.95e+03 | 1.46e+03 | 3.17e+03 | 5.60e+03 | 1.47e+04 |
| 9 | load_mw_iqr_7d | Float64 | 0 (0.0%) | 515 (51.5%) | 7.31e+03 | 1.82e+03 | 5.08e+03 | 6.79e+03 | 1.29e+04 |
| 10 | weather_temperature_2m_paris_future_1h | Float32 | 0 (0.0%) | 219 (21.9%) | 10.8 | 4.79 | 1.16 | 10.4 | 26.6 |
| 11 | weather_temperature_2m_paris_future_24h | Float32 | 0 (0.0%) | 215 (21.5%) | 10.7 | 4.67 | 1.16 | 10.4 | 26.6 |
| 12 | weather_precipitation_paris_future_1h | Float32 | 0 (0.0%) | 18 (1.8%) | 0.0510 | 0.242 | 0.00 | 0.00 | 3.00 |
| 13 | weather_precipitation_paris_future_24h | Float32 | 0 (0.0%) | 18 (1.8%) | 0.0517 | 0.242 | 0.00 | 0.00 | 3.00 |
| 14 | weather_wind_speed_10m_paris_future_1h | Float32 | 0 (0.0%) | 617 (61.7%) | 12.2 | 5.65 | 1.08 | 11.4 | 29.9 |
| 15 | weather_wind_speed_10m_paris_future_24h | Float32 | 0 (0.0%) | 621 (62.1%) | 12.3 | 5.65 | 1.08 | 11.5 | 29.9 |
| 16 | weather_cloud_cover_paris_future_1h | Float32 | 0 (0.0%) | 82 (8.2%) | 52.8 | 45.0 | 0.00 | 62.0 | 100. |
| 17 | weather_cloud_cover_paris_future_24h | Float32 | 0 (0.0%) | 82 (8.2%) | 54.5 | 44.9 | 0.00 | 72.0 | 100. |
| 18 | weather_soil_moisture_1_to_3cm_paris_future_1h | Float32 | 1000 (100.0%) | ||||||
| 19 | weather_soil_moisture_1_to_3cm_paris_future_24h | Float32 | 1000 (100.0%) | ||||||
| 20 | weather_relative_humidity_2m_paris_future_1h | Float32 | 0 (0.0%) | 73 (7.3%) | 56.4 | 17.2 | 24.0 | 56.0 | 96.0 |
| 21 | weather_relative_humidity_2m_paris_future_24h | Float32 | 0 (0.0%) | 73 (7.3%) | 56.8 | 17.2 | 24.0 | 57.0 | 96.0 |
| 22 | weather_temperature_2m_lyon_future_1h | Float32 | 0 (0.0%) | 249 (24.9%) | 11.3 | 5.23 | -0.465 | 10.8 | 24.6 |
| 23 | weather_temperature_2m_lyon_future_24h | Float32 | 0 (0.0%) | 245 (24.5%) | 11.2 | 5.14 | -0.465 | 10.8 | 24.6 |
| 24 | weather_precipitation_lyon_future_1h | Float32 | 0 (0.0%) | 32 (3.2%) | 0.131 | 0.584 | 0.00 | 0.00 | 7.60 |
| 25 | weather_precipitation_lyon_future_24h | Float32 | 0 (0.0%) | 35 (3.5%) | 0.155 | 0.630 | 0.00 | 0.00 | 7.60 |
| 26 | weather_wind_speed_10m_lyon_future_1h | Float32 | 0 (0.0%) | 571 (57.1%) | 10.6 | 6.92 | 0.00 | 8.50 | 39.2 |
| 27 | weather_wind_speed_10m_lyon_future_24h | Float32 | 0 (0.0%) | 574 (57.4%) | 10.6 | 6.95 | 0.00 | 8.40 | 39.2 |
| 28 | weather_cloud_cover_lyon_future_1h | Float32 | 0 (0.0%) | 82 (8.2%) | 54.0 | 46.0 | 0.00 | 71.0 | 101. |
| 29 | weather_cloud_cover_lyon_future_24h | Float32 | 0 (0.0%) | 81 (8.1%) | 55.9 | 45.9 | 0.00 | 83.0 | 101. |
| 30 | weather_soil_moisture_1_to_3cm_lyon_future_1h | Float32 | 1000 (100.0%) | ||||||
| 31 | weather_soil_moisture_1_to_3cm_lyon_future_24h | Float32 | 1000 (100.0%) | ||||||
| 32 | weather_relative_humidity_2m_lyon_future_1h | Float32 | 0 (0.0%) | 77 (7.7%) | 63.5 | 19.4 | 22.0 | 62.0 | 98.0 |
| 33 | weather_relative_humidity_2m_lyon_future_24h | Float32 | 0 (0.0%) | 77 (7.7%) | 64.4 | 19.8 | 22.0 | 63.0 | 98.0 |
| 34 | weather_temperature_2m_marseille_future_1h | Float32 | 0 (0.0%) | 161 (16.1%) | 14.0 | 2.78 | 4.78 | 14.5 | 21.3 |
| 35 | weather_temperature_2m_marseille_future_24h | Float32 | 0 (0.0%) | 160 (16.0%) | 14.0 | 2.79 | 4.78 | 14.6 | 21.3 |
| 36 | weather_precipitation_marseille_future_1h | Float32 | 0 (0.0%) | 29 (2.9%) | 0.0983 | 0.493 | 0.00 | 0.00 | 6.10 |
| 37 | weather_precipitation_marseille_future_24h | Float32 | 0 (0.0%) | 29 (2.9%) | 0.114 | 0.505 | 0.00 | 0.00 | 6.10 |
| 38 | weather_wind_speed_10m_marseille_future_1h | Float32 | 0 (0.0%) | 734 (73.4%) | 17.5 | 14.2 | 0.805 | 12.7 | 68.6 |
| 39 | weather_wind_speed_10m_marseille_future_24h | Float32 | 0 (0.0%) | 742 (74.2%) | 17.9 | 14.2 | 0.805 | 13.1 | 68.6 |
| 40 | weather_cloud_cover_marseille_future_1h | Float32 | 0 (0.0%) | 85 (8.5%) | 51.9 | 45.7 | 0.00 | 60.0 | 100. |
| 41 | weather_cloud_cover_marseille_future_24h | Float32 | 0 (0.0%) | 85 (8.5%) | 53.3 | 45.5 | 0.00 | 66.0 | 100. |
| 42 | weather_soil_moisture_1_to_3cm_marseille_future_1h | Float32 | 1000 (100.0%) | ||||||
| 43 | weather_soil_moisture_1_to_3cm_marseille_future_24h | Float32 | 1000 (100.0%) | ||||||
| 44 | weather_relative_humidity_2m_marseille_future_1h | Float32 | 0 (0.0%) | 69 (6.9%) | 62.6 | 14.0 | 27.0 | 61.0 | 95.0 |
| 45 | weather_relative_humidity_2m_marseille_future_24h | Float32 | 0 (0.0%) | 69 (6.9%) | 63.1 | 14.4 | 27.0 | 61.0 | 95.0 |
| 46 | weather_temperature_2m_toulouse_future_1h | Float32 | 0 (0.0%) | 223 (22.3%) | 12.4 | 4.39 | 1.48 | 12.1 | 27.2 |
| 47 | weather_temperature_2m_toulouse_future_24h | Float32 | 0 (0.0%) | 221 (22.1%) | 12.3 | 4.35 | 1.48 | 12.1 | 27.2 |
| 48 | weather_precipitation_toulouse_future_1h | Float32 | 0 (0.0%) | 16 (1.6%) | 0.0487 | 0.192 | 0.00 | 0.00 | 2.20 |
| 49 | weather_precipitation_toulouse_future_24h | Float32 | 0 (0.0%) | 18 (1.8%) | 0.0641 | 0.228 | 0.00 | 0.00 | 2.20 |
| 50 | weather_wind_speed_10m_toulouse_future_1h | Float32 | 0 (0.0%) | 668 (66.8%) | 13.1 | 6.86 | 0.360 | 12.3 | 39.5 |
| 51 | weather_wind_speed_10m_toulouse_future_24h | Float32 | 0 (0.0%) | 669 (66.9%) | 13.0 | 6.86 | 0.360 | 12.1 | 39.5 |
| 52 | weather_cloud_cover_toulouse_future_1h | Float32 | 0 (0.0%) | 84 (8.4%) | 59.9 | 44.6 | 0.00 | 92.0 | 101. |
| 53 | weather_cloud_cover_toulouse_future_24h | Float32 | 0 (0.0%) | 85 (8.5%) | 61.8 | 44.1 | 0.00 | 98.0 | 101. |
| 54 | weather_soil_moisture_1_to_3cm_toulouse_future_1h | Float32 | 1000 (100.0%) | ||||||
| 55 | weather_soil_moisture_1_to_3cm_toulouse_future_24h | Float32 | 1000 (100.0%) | ||||||
| 56 | weather_relative_humidity_2m_toulouse_future_1h | Float32 | 0 (0.0%) | 76 (7.6%) | 62.4 | 19.4 | 22.0 | 60.0 | 98.0 |
| 57 | weather_relative_humidity_2m_toulouse_future_24h | Float32 | 0 (0.0%) | 76 (7.6%) | 63.1 | 19.7 | 22.0 | 61.0 | 98.0 |
| 58 | weather_temperature_2m_lille_future_1h | Float32 | 0 (0.0%) | 226 (22.6%) | 8.43 | 4.71 | -0.850 | 7.65 | 24.5 |
| 59 | weather_temperature_2m_lille_future_24h | Float32 | 0 (0.0%) | 220 (22.0%) | 8.39 | 4.63 | -0.850 | 7.65 | 24.5 |
| 60 | weather_precipitation_lille_future_1h | Float32 | 0 (0.0%) | 17 (1.7%) | 0.0482 | 0.249 | 0.00 | 0.00 | 5.80 |
| 61 | weather_precipitation_lille_future_24h | Float32 | 0 (0.0%) | 18 (1.8%) | 0.0522 | 0.253 | 0.00 | 0.00 | 5.80 |
| 62 | weather_wind_speed_10m_lille_future_1h | Float32 | 0 (0.0%) | 692 (69.2%) | 14.1 | 7.04 | 0.720 | 13.6 | 43.5 |
| 63 | weather_wind_speed_10m_lille_future_24h | Float32 | 0 (0.0%) | 693 (69.3%) | 14.1 | 7.08 | 0.720 | 13.6 | 43.5 |
| 64 | weather_cloud_cover_lille_future_1h | Float32 | 0 (0.0%) | 87 (8.7%) | 49.8 | 43.6 | 0.00 | 32.0 | 101. |
| 65 | weather_cloud_cover_lille_future_24h | Float32 | 0 (0.0%) | 87 (8.7%) | 50.9 | 43.7 | 0.00 | 36.0 | 101. |
| 66 | weather_soil_moisture_1_to_3cm_lille_future_1h | Float32 | 1000 (100.0%) | ||||||
| 67 | weather_soil_moisture_1_to_3cm_lille_future_24h | Float32 | 1000 (100.0%) | ||||||
| 68 | weather_relative_humidity_2m_lille_future_1h | Float32 | 0 (0.0%) | 69 (6.9%) | 64.2 | 16.5 | 29.0 | 65.0 | 97.0 |
| 69 | weather_relative_humidity_2m_lille_future_24h | Float32 | 0 (0.0%) | 69 (6.9%) | 64.5 | 16.5 | 29.0 | 65.0 | 97.0 |
| 70 | weather_temperature_2m_limoges_future_1h | Float32 | 0 (0.0%) | 279 (27.9%) | 9.89 | 5.99 | -4.30 | 9.60 | 25.7 |
| 71 | weather_temperature_2m_limoges_future_24h | Float32 | 0 (0.0%) | 277 (27.7%) | 9.79 | 5.90 | -4.30 | 9.50 | 25.7 |
| 72 | weather_precipitation_limoges_future_1h | Float32 | 0 (0.0%) | 25 (2.5%) | 0.0960 | 0.480 | 0.00 | 0.00 | 6.70 |
| 73 | weather_precipitation_limoges_future_24h | Float32 | 0 (0.0%) | 25 (2.5%) | 0.103 | 0.483 | 0.00 | 0.00 | 6.70 |
| 74 | weather_wind_speed_10m_limoges_future_1h | Float32 | 0 (0.0%) | 466 (46.6%) | 7.90 | 4.39 | 0.00 | 6.62 | 21.6 |
| 75 | weather_wind_speed_10m_limoges_future_24h | Float32 | 0 (0.0%) | 474 (47.4%) | 7.94 | 4.40 | 0.00 | 6.83 | 21.6 |
| 76 | weather_cloud_cover_limoges_future_1h | Float32 | 0 (0.0%) | 83 (8.3%) | 54.4 | 45.2 | 0.00 | 69.0 | 100. |
| 77 | weather_cloud_cover_limoges_future_24h | Float32 | 0 (0.0%) | 83 (8.3%) | 56.2 | 45.0 | 0.00 | 79.0 | 100. |
| 78 | weather_soil_moisture_1_to_3cm_limoges_future_1h | Float32 | 1000 (100.0%) | ||||||
| 79 | weather_soil_moisture_1_to_3cm_limoges_future_24h | Float32 | 1000 (100.0%) | ||||||
| 80 | weather_relative_humidity_2m_limoges_future_1h | Float32 | 0 (0.0%) | 81 (8.1%) | 68.6 | 23.4 | 20.0 | 71.0 | 100. |
| 81 | weather_relative_humidity_2m_limoges_future_24h | Float32 | 0 (0.0%) | 81 (8.1%) | 69.4 | 23.5 | 20.0 | 72.0 | 100. |
| 82 | weather_temperature_2m_nantes_future_1h | Float32 | 0 (0.0%) | 245 (24.5%) | 9.76 | 5.31 | -2.47 | 10.2 | 22.7 |
| 83 | weather_temperature_2m_nantes_future_24h | Float32 | 0 (0.0%) | 243 (24.3%) | 9.67 | 5.22 | -2.47 | 10.1 | 22.7 |
| 84 | weather_precipitation_nantes_future_1h | Float32 | 0 (0.0%) | 14 (1.4%) | 0.0275 | 0.139 | 0.00 | 0.00 | 2.20 |
| 85 | weather_precipitation_nantes_future_24h | Float32 | 0 (0.0%) | 15 (1.5%) | 0.0328 | 0.157 | 0.00 | 0.00 | 2.20 |
| 86 | weather_wind_speed_10m_nantes_future_1h | Float32 | 0 (0.0%) | 713 (71.3%) | 16.0 | 7.37 | 1.48 | 14.3 | 42.3 |
| 87 | weather_wind_speed_10m_nantes_future_24h | Float32 | 0 (0.0%) | 720 (72.0%) | 16.1 | 7.48 | 1.48 | 14.5 | 42.3 |
| 88 | weather_cloud_cover_nantes_future_1h | Float32 | 0 (0.0%) | 87 (8.7%) | 49.7 | 45.0 | 0.00 | 32.0 | 100. |
| 89 | weather_cloud_cover_nantes_future_24h | Float32 | 0 (0.0%) | 87 (8.7%) | 51.2 | 44.9 | 0.00 | 47.0 | 100. |
| 90 | weather_soil_moisture_1_to_3cm_nantes_future_1h | Float32 | 1000 (100.0%) | ||||||
| 91 | weather_soil_moisture_1_to_3cm_nantes_future_24h | Float32 | 1000 (100.0%) | ||||||
| 92 | weather_relative_humidity_2m_nantes_future_1h | Float32 | 0 (0.0%) | 59 (5.9%) | 72.8 | 15.7 | 39.0 | 74.0 | 98.0 |
| 93 | weather_relative_humidity_2m_nantes_future_24h | Float32 | 0 (0.0%) | 59 (5.9%) | 72.9 | 15.8 | 39.0 | 74.0 | 98.0 |
| 94 | weather_temperature_2m_strasbourg_future_1h | Float32 | 0 (0.0%) | 236 (23.6%) | 9.53 | 4.83 | -0.563 | 9.04 | 26.2 |
| 95 | weather_temperature_2m_strasbourg_future_24h | Float32 | 0 (0.0%) | 230 (23.0%) | 9.46 | 4.74 | -0.563 | 9.04 | 26.2 |
| 96 | weather_precipitation_strasbourg_future_1h | Float32 | 0 (0.0%) | 21 (2.1%) | 0.0675 | 0.246 | 0.00 | 0.00 | 2.80 |
| 97 | weather_precipitation_strasbourg_future_24h | Float32 | 0 (0.0%) | 25 (2.5%) | 0.0925 | 0.349 | 0.00 | 0.00 | 4.30 |
| 98 | weather_wind_speed_10m_strasbourg_future_1h | Float32 | 0 (0.0%) | 556 (55.6%) | 10.3 | 5.50 | 0.509 | 9.69 | 29.9 |
| 99 | weather_wind_speed_10m_strasbourg_future_24h | Float32 | 0 (0.0%) | 558 (55.8%) | 10.3 | 5.45 | 0.509 | 9.69 | 29.9 |
| 100 | weather_cloud_cover_strasbourg_future_1h | Float32 | 0 (0.0%) | 85 (8.5%) | 64.0 | 42.9 | 0.00 | 99.0 | 100. |
| 101 | weather_cloud_cover_strasbourg_future_24h | Float32 | 0 (0.0%) | 84 (8.4%) | 65.3 | 42.8 | 0.00 | 100. | 100. |
| 102 | weather_soil_moisture_1_to_3cm_strasbourg_future_1h | Float32 | 1000 (100.0%) | ||||||
| 103 | weather_soil_moisture_1_to_3cm_strasbourg_future_24h | Float32 | 1000 (100.0%) | ||||||
| 104 | weather_relative_humidity_2m_strasbourg_future_1h | Float32 | 0 (0.0%) | 70 (7.0%) | 65.1 | 17.3 | 29.0 | 66.0 | 98.0 |
| 105 | weather_relative_humidity_2m_strasbourg_future_24h | Float32 | 0 (0.0%) | 71 (7.1%) | 66.0 | 17.8 | 29.0 | 67.0 | 99.0 |
| 106 | weather_temperature_2m_brest_future_1h | Float32 | 0 (0.0%) | 203 (20.3%) | 8.99 | 4.30 | 0.928 | 9.23 | 23.2 |
| 107 | weather_temperature_2m_brest_future_24h | Float32 | 0 (0.0%) | 199 (19.9%) | 8.89 | 4.21 | 0.928 | 9.03 | 23.2 |
| 108 | weather_precipitation_brest_future_1h | Float32 | 0 (0.0%) | 12 (1.2%) | 0.0332 | 0.118 | 0.00 | 0.00 | 1.40 |
| 109 | weather_precipitation_brest_future_24h | Float32 | 0 (0.0%) | 13 (1.3%) | 0.0387 | 0.131 | 0.00 | 0.00 | 1.50 |
| 110 | weather_wind_speed_10m_brest_future_1h | Float32 | 0 (0.0%) | 779 (77.9%) | 18.0 | 8.97 | 0.805 | 16.7 | 44.9 |
| 111 | weather_wind_speed_10m_brest_future_24h | Float32 | 0 (0.0%) | 786 (78.6%) | 18.2 | 9.11 | 0.805 | 17.0 | 44.9 |
| 112 | weather_cloud_cover_brest_future_1h | Float32 | 0 (0.0%) | 91 (9.1%) | 55.2 | 43.4 | 0.00 | 66.0 | 100. |
| 113 | weather_cloud_cover_brest_future_24h | Float32 | 0 (0.0%) | 91 (9.1%) | 55.5 | 43.4 | 0.00 | 68.0 | 100. |
| 114 | weather_soil_moisture_1_to_3cm_brest_future_1h | Float32 | 1000 (100.0%) | ||||||
| 115 | weather_soil_moisture_1_to_3cm_brest_future_24h | Float32 | 1000 (100.0%) | ||||||
| 116 | weather_relative_humidity_2m_brest_future_1h | Float32 | 0 (0.0%) | 57 (5.7%) | 72.6 | 13.6 | 43.0 | 74.0 | 99.0 |
| 117 | weather_relative_humidity_2m_brest_future_24h | Float32 | 0 (0.0%) | 57 (5.7%) | 72.9 | 13.6 | 43.0 | 75.0 | 99.0 |
| 118 | weather_temperature_2m_bayonne_future_1h | Float32 | 0 (0.0%) | 247 (24.7%) | 12.2 | 5.01 | -0.202 | 12.1 | 29.4 |
| 119 | weather_temperature_2m_bayonne_future_24h | Float32 | 0 (0.0%) | 241 (24.1%) | 12.1 | 4.89 | -0.202 | 12.0 | 29.4 |
| 120 | weather_precipitation_bayonne_future_1h | Float32 | 0 (0.0%) | 26 (2.6%) | 0.0919 | 0.420 | 0.00 | 0.00 | 4.40 |
| 121 | weather_precipitation_bayonne_future_24h | Float32 | 0 (0.0%) | 29 (2.9%) | 0.109 | 0.468 | 0.00 | 0.00 | 5.40 |
| 122 | weather_wind_speed_10m_bayonne_future_1h | Float32 | 0 (0.0%) | 537 (53.7%) | 10.1 | 5.11 | 0.509 | 9.00 | 33.3 |
| 123 | weather_wind_speed_10m_bayonne_future_24h | Float32 | 0 (0.0%) | 548 (54.8%) | 10.4 | 5.45 | 0.509 | 9.11 | 33.3 |
| 124 | weather_cloud_cover_bayonne_future_1h | Float32 | 0 (0.0%) | 87 (8.7%) | 61.4 | 44.2 | -1.00 | 97.0 | 100. |
| 125 | weather_cloud_cover_bayonne_future_24h | Float32 | 0 (0.0%) | 87 (8.7%) | 62.4 | 44.1 | -1.00 | 100. | 100. |
| 126 | weather_soil_moisture_1_to_3cm_bayonne_future_1h | Float32 | 1000 (100.0%) | ||||||
| 127 | weather_soil_moisture_1_to_3cm_bayonne_future_24h | Float32 | 1000 (100.0%) | ||||||
| 128 | weather_relative_humidity_2m_bayonne_future_1h | Float32 | 0 (0.0%) | 70 (7.0%) | 70.5 | 16.1 | 25.0 | 72.0 | 98.0 |
| 129 | weather_relative_humidity_2m_bayonne_future_24h | Float32 | 0 (0.0%) | 69 (6.9%) | 71.0 | 15.9 | 25.0 | 72.0 | 98.0 |
| 130 | cal_hour_of_day_future_1h | Int8 | 0 (0.0%) | 24 (2.4%) | 11.5 | 6.90 | 0.00 | 11.0 | 23.0 |
| 131 | cal_hour_of_day_future_24h | Int8 | 0 (0.0%) | 24 (2.4%) | 11.5 | 6.90 | 0.00 | 11.0 | 23.0 |
| 132 | cal_day_of_week_future_1h | Int8 | 0 (0.0%) | 7 (0.7%) | 4.02 | 1.99 | 1.00 | 4.00 | 7.00 |
| 133 | cal_day_of_week_future_24h | Int8 | 0 (0.0%) | 7 (0.7%) | 4.01 | 2.00 | 1.00 | 4.00 | 7.00 |
| 134 | cal_day_of_year_future_1h | Int16 | 0 (0.0%) | 42 (4.2%) | 109. | 12.0 | 89.0 | 109. | 130. |
| 135 | cal_day_of_year_future_24h | Int16 | 0 (0.0%) | 42 (4.2%) | 110. | 12.0 | 90.0 | 110. | 131. |
| 136 | cal_year_future_1h | Int32 | 0 (0.0%) | 1 (0.1%) | 2.02e+03 | 0.00 | |||
| 137 | cal_year_future_24h | Int32 | 0 (0.0%) | 1 (0.1%) | 2.02e+03 | 0.00 | |||
| 138 | cal_is_holiday_future_1h | Boolean | 0 (0.0%) | 2 (0.2%) | |||||
| 139 | cal_is_holiday_future_24h | Boolean | 0 (0.0%) | 2 (0.2%) |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Let’s build training and evaluation targets for all possible horizons from 1 to 24 hours.
horizons = range(1, 25)
target_column_name_pattern = "load_mw_horizon_{horizon}h"
@skrub.deferred
def build_targets(prediction_time, electricity, horizons):
return prediction_time.join(
electricity.with_columns(
[
pl.col("load_mw")
.shift(-h)
.alias(target_column_name_pattern.format(horizon=h))
for h in horizons
]
),
left_on="prediction_time",
right_on="time",
)
targets = build_targets(prediction_time, electricity, horizons)
targets
Show graph
| prediction_time | load_mw | load_mw_horizon_1h | load_mw_horizon_2h | load_mw_horizon_3h | load_mw_horizon_4h | load_mw_horizon_5h | load_mw_horizon_6h | load_mw_horizon_7h | load_mw_horizon_8h | load_mw_horizon_9h | load_mw_horizon_10h | load_mw_horizon_11h | load_mw_horizon_12h | load_mw_horizon_13h | load_mw_horizon_14h | load_mw_horizon_15h | load_mw_horizon_16h | load_mw_horizon_17h | load_mw_horizon_18h | load_mw_horizon_19h | load_mw_horizon_20h | load_mw_horizon_21h | load_mw_horizon_22h | load_mw_horizon_23h | load_mw_horizon_24h |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-30 00:00:00+00:00 | 46395.0 | 44269.0 | 43874.0 | 46197.0 | 51913.0 | 56939.0 | 58329.0 | 57671.0 | 55421.0 | 54578.0 | 54866.0 | 53069.0 | 50920.0 | 49051.0 | 47607.0 | 46991.0 | 48358.0 | 50709.0 | 51211.0 | 49234.0 | 49122.0 | 49962.0 | 47394.0 | 45452.0 | 44510.0 |
| 2021-03-30 01:00:00+00:00 | 44269.0 | 43874.0 | 46197.0 | 51913.0 | 56939.0 | 58329.0 | 57671.0 | 55421.0 | 54578.0 | 54866.0 | 53069.0 | 50920.0 | 49051.0 | 47607.0 | 46991.0 | 48358.0 | 50709.0 | 51211.0 | 49234.0 | 49122.0 | 49962.0 | 47394.0 | 45452.0 | 44510.0 | 42417.0 |
| 2021-03-30 02:00:00+00:00 | 43874.0 | 46197.0 | 51913.0 | 56939.0 | 58329.0 | 57671.0 | 55421.0 | 54578.0 | 54866.0 | 53069.0 | 50920.0 | 49051.0 | 47607.0 | 46991.0 | 48358.0 | 50709.0 | 51211.0 | 49234.0 | 49122.0 | 49962.0 | 47394.0 | 45452.0 | 44510.0 | 42417.0 | 41633.0 |
| 2021-03-30 03:00:00+00:00 | 46197.0 | 51913.0 | 56939.0 | 58329.0 | 57671.0 | 55421.0 | 54578.0 | 54866.0 | 53069.0 | 50920.0 | 49051.0 | 47607.0 | 46991.0 | 48358.0 | 50709.0 | 51211.0 | 49234.0 | 49122.0 | 49962.0 | 47394.0 | 45452.0 | 44510.0 | 42417.0 | 41633.0 | 43640.0 |
| 2021-03-30 04:00:00+00:00 | 51913.0 | 56939.0 | 58329.0 | 57671.0 | 55421.0 | 54578.0 | 54866.0 | 53069.0 | 50920.0 | 49051.0 | 47607.0 | 46991.0 | 48358.0 | 50709.0 | 51211.0 | 49234.0 | 49122.0 | 49962.0 | 47394.0 | 45452.0 | 44510.0 | 42417.0 | 41633.0 | 43640.0 | 48555.0 |
| 2021-05-10 11:00:00+00:00 | 51473.0 | 50153.0 | 48774.0 | 47256.0 | 46422.0 | 48198.0 | 49468.0 | 47530.0 | 46548.0 | 47150.0 | 48079.0 | 45521.0 | 43378.0 | 42265.0 | 40329.0 | 39815.0 | 41663.0 | 45120.0 | 49615.0 | 52385.0 | 53007.0 | 52498.0 | 53188.0 | 54010.0 | 52509.0 |
| 2021-05-10 12:00:00+00:00 | 50153.0 | 48774.0 | 47256.0 | 46422.0 | 48198.0 | 49468.0 | 47530.0 | 46548.0 | 47150.0 | 48079.0 | 45521.0 | 43378.0 | 42265.0 | 40329.0 | 39815.0 | 41663.0 | 45120.0 | 49615.0 | 52385.0 | 53007.0 | 52498.0 | 53188.0 | 54010.0 | 52509.0 | 50779.0 |
| 2021-05-10 13:00:00+00:00 | 48774.0 | 47256.0 | 46422.0 | 48198.0 | 49468.0 | 47530.0 | 46548.0 | 47150.0 | 48079.0 | 45521.0 | 43378.0 | 42265.0 | 40329.0 | 39815.0 | 41663.0 | 45120.0 | 49615.0 | 52385.0 | 53007.0 | 52498.0 | 53188.0 | 54010.0 | 52509.0 | 50779.0 | 48999.0 |
| 2021-05-10 14:00:00+00:00 | 47256.0 | 46422.0 | 48198.0 | 49468.0 | 47530.0 | 46548.0 | 47150.0 | 48079.0 | 45521.0 | 43378.0 | 42265.0 | 40329.0 | 39815.0 | 41663.0 | 45120.0 | 49615.0 | 52385.0 | 53007.0 | 52498.0 | 53188.0 | 54010.0 | 52509.0 | 50779.0 | 48999.0 | 47679.0 |
| 2021-05-10 15:00:00+00:00 | 46422.0 | 48198.0 | 49468.0 | 47530.0 | 46548.0 | 47150.0 | 48079.0 | 45521.0 | 43378.0 | 42265.0 | 40329.0 | 39815.0 | 41663.0 | 45120.0 | 49615.0 | 52385.0 | 53007.0 | 52498.0 | 53188.0 | 54010.0 | 52509.0 | 50779.0 | 48999.0 | 47679.0 | 46931.0 |
prediction_time
Datetime- Null values
- 0 (0.0%)
- Unique values
- 1,000 (100.0%)
- Min | Max
- 2021-03-30T00:00:00+00:00 | 2021-05-10T15:00:00+00:00
load_mw
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.51e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_1h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.37e+03
- Median ± IQR
- 5.06e+04 ± 8.48e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_2h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.06e+04 ± 8.45e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_3h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.06e+04 ± 8.43e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_4h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.06e+04 ± 8.43e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_5h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.05e+04 ± 8.43e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_6h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.05e+04 ± 8.39e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_7h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.05e+04 ± 8.39e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_8h
Float64- Null values
- 0 (0.0%)
- Unique values
- 964 (96.4%)
- Mean ± Std
- 5.08e+04 ± 6.36e+03
- Median ± IQR
- 5.05e+04 ± 8.42e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_9h
Float64- Null values
- 0 (0.0%)
- Unique values
- 964 (96.4%)
- Mean ± Std
- 5.07e+04 ± 6.36e+03
- Median ± IQR
- 5.05e+04 ± 8.36e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_10h
Float64- Null values
- 0 (0.0%)
- Unique values
- 964 (96.4%)
- Mean ± Std
- 5.07e+04 ± 6.37e+03
- Median ± IQR
- 5.05e+04 ± 8.38e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_11h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.44e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_12h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.44e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_13h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_14h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_15h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_16h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.04e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_17h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_18h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_19h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_20h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_21h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_22h
Float64- Null values
- 0 (0.0%)
- Unique values
- 965 (96.5%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_23h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.49e+03
- Min | Max
- 3.35e+04 | 6.95e+04
load_mw_horizon_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.44e+03
- Min | Max
- 3.35e+04 | 6.95e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | prediction_time | Datetime | 0 (0.0%) | 1000 (100.0%) | 2021-03-30T00:00:00+00:00 | 2021-05-10T15:00:00+00:00 | |||
| 1 | load_mw | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 2 | load_mw_horizon_1h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.37e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 3 | load_mw_horizon_2h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 4 | load_mw_horizon_3h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 5 | load_mw_horizon_4h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.06e+04 | 6.95e+04 |
| 6 | load_mw_horizon_5h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 7 | load_mw_horizon_6h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 8 | load_mw_horizon_7h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 9 | load_mw_horizon_8h | Float64 | 0 (0.0%) | 964 (96.4%) | 5.08e+04 | 6.36e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 10 | load_mw_horizon_9h | Float64 | 0 (0.0%) | 964 (96.4%) | 5.07e+04 | 6.36e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 11 | load_mw_horizon_10h | Float64 | 0 (0.0%) | 964 (96.4%) | 5.07e+04 | 6.37e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 12 | load_mw_horizon_11h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 13 | load_mw_horizon_12h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 14 | load_mw_horizon_13h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 15 | load_mw_horizon_14h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 16 | load_mw_horizon_15h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 17 | load_mw_horizon_16h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.04e+04 | 6.95e+04 |
| 18 | load_mw_horizon_17h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 19 | load_mw_horizon_18h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 20 | load_mw_horizon_19h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 21 | load_mw_horizon_20h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 22 | load_mw_horizon_21h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 23 | load_mw_horizon_22h | Float64 | 0 (0.0%) | 965 (96.5%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 24 | load_mw_horizon_23h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
| 25 | load_mw_horizon_24h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
For now, let’s focus on the last horizon (24 hours) to train a model predicting the electricity load at the next 24 hours.
horizon_of_interest = horizons[-1] # Focus on the 24-hour horizon
target_column_name = target_column_name_pattern.format(horizon=horizon_of_interest)
predicted_target_column_name = "predicted_" + target_column_name
target = targets[target_column_name].skb.mark_as_y()
target
Show graph
| load_mw_horizon_24h |
|---|
| 44510.0 |
| 42417.0 |
| 41633.0 |
| 43640.0 |
| 48555.0 |
| 52509.0 |
| 50779.0 |
| 48999.0 |
| 47679.0 |
| 46931.0 |
load_mw_horizon_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 966 (96.6%)
- Mean ± Std
- 5.07e+04 ± 6.38e+03
- Median ± IQR
- 5.05e+04 ± 8.44e+03
- Min | Max
- 3.35e+04 | 6.95e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | load_mw_horizon_24h | Float64 | 0 (0.0%) | 966 (96.6%) | 5.07e+04 | 6.38e+03 | 3.35e+04 | 5.05e+04 | 6.95e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Let’s define our first single output prediction pipeline. This pipeline
chains our previous feature engineering steps with a skrub.DropCols step to
drop some columns that we do not want to use as features, and a
HistGradientBoostingRegressor model from scikit-learn.
The skrub.choose_from, skrub.choose_float, and skrub.choose_int
functions are used to define hyperparameters that can be tuned via
cross-validated randomized search.
from sklearn.ensemble import HistGradientBoostingRegressor
import skrub.selectors as s
features_with_dropped_cols = features.skb.apply(
skrub.DropCols(
cols=skrub.choose_from(
{
"none": s.glob(""), # No column has an empty name.
"load": s.glob("load_*"),
"rolling_load": s.glob("load_mw_rolling_*"),
"weather": s.glob("weather_*"),
"temperature": s.glob("weather_temperature_*"),
"moisture": s.glob("weather_moisture_*"),
"cloud_cover": s.glob("weather_cloud_cover_*"),
"calendar": s.glob("cal_*"),
"holiday": s.glob("cal_is_holiday*"),
"future_1h": s.glob("*_future_1h"),
"future_24h": s.glob("*_future_24h"),
"non_paris_weather": s.glob("weather_*") & ~s.glob("weather_*_paris_*"),
},
name="dropped_cols",
)
)
)
hgbr_predictions = features_with_dropped_cols.skb.apply(
HistGradientBoostingRegressor(
random_state=0,
loss=skrub.choose_from(["squared_error", "poisson", "gamma"], name="loss"),
learning_rate=skrub.choose_float(
0.01, 1, default=0.1, log=True, name="learning_rate"
),
max_leaf_nodes=skrub.choose_int(
3, 300, default=30, log=True, name="max_leaf_nodes"
),
),
y=target,
)
hgbr_predictions
Show graph
| load_mw_horizon_24h |
|---|
| 44593.27969335909 |
| 42376.41310188364 |
| 41729.31758180694 |
| 43485.360964862004 |
| 48788.04317090869 |
| 52537.52521735487 |
| 50802.48171719284 |
| 49074.342186199145 |
| 47543.76458849605 |
| 47067.28272541773 |
load_mw_horizon_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 1,000 (100.0%)
- Mean ± Std
- 5.07e+04 ± 6.36e+03
- Median ± IQR
- 5.06e+04 ± 8.30e+03
- Min | Max
- 3.39e+04 | 6.93e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | load_mw_horizon_24h | Float64 | 0 (0.0%) | 1000 (100.0%) | 5.07e+04 | 6.36e+03 | 3.39e+04 | 5.06e+04 | 6.93e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
The predictions expression captures the whole expression graph that
includes the feature engineering steps, the target variable, and the model
training step.
In particular, the input data keys for the full pipeline can be inspected as follows:
hgbr_predictions.skb.get_data().keys()
dict_keys(['prediction_start_time', 'prediction_end_time', 'historical_data_start_time', 'historical_data_end_time', 'data_source_folder', 'city_names'])
Furthermore, the hyper-parameters of the full pipeline can be retrieved as follows:
hgbr_pipeline = hgbr_predictions.skb.get_pipeline()
hgbr_pipeline.describe_params()
{'dropped_cols': 'none',
'learning_rate': 0.1,
'loss': 'squared_error',
'max_leaf_nodes': 30}
When running this notebook locally, you can also interactively inspect all the steps of the DAG using the following (once uncommented):
# predictions.skb.full_report()
Since we passed input values to all the upstream skrub variables, skrub
automatically evaluates the whole expression graph graph (train and predict
on the same data) so that we can interactively check that everything will
work as expected.
Let’s use altair to visualize the predictions against the target values for the last 24 hours of the prediction time range used to train the model. This allows us can (over)fit the data with the features at hand.
altair.Chart(
pl.concat(
[
targets.skb.preview(),
hgbr_predictions.rename(
{target_column_name: predicted_target_column_name}
).skb.preview(),
],
how="horizontal",
).tail(24 * 7)
).transform_fold(
[target_column_name, predicted_target_column_name],
).mark_line(
tooltip=True
).encode(
x="prediction_time:T", y="value:Q", color="key:N"
).interactive()
Assessing the model performance via cross-validation#
Being able to fit the training data is not enough. We need to assess the ability of the training pipeline to learn a predictive model that can generalize to unseen data.
Furthermore, we want to assess the uncertainty of this estimate of the generalization performance via time-based cross-validation, also known as backtesting.
scikit-learn provides a TimeSeriesSplit splitter providing a convenient way to
split temporal data: in the different folds, the training data always precedes the
test data. It implies that the size of the training data is getting larger as the
fold index increases. The scikit-learn utility allows to define a couple of
parameters to control the size of the training and test data and as well as a gap
between the training and test data to potentially avoid leakage if our model relies
on lagged features.
In the example below, we define that the training data should be at most 2 years worth of data and the test data should be 24 weeks long. We also define a gap of 1 week between the training.
Let’s check those statistics by iterating over the different folds provided by the splitter.
from sklearn.model_selection import TimeSeriesSplit
max_train_size = 2 * 52 * 24 * 7 # max ~2 years of training data
test_size = 24 * 7 * 24 # 24 weeks of test data
gap = 7 * 24 # 1 week gap between train and test sets
ts_cv_5 = TimeSeriesSplit(
n_splits=5, max_train_size=max_train_size, test_size=test_size, gap=gap
)
for fold_idx, (train_idx, test_idx) in enumerate(
ts_cv_5.split(prediction_time.skb.eval())
):
print(f"CV iteration #{fold_idx}")
train_datetimes = prediction_time.skb.eval()[train_idx]
test_datetimes = prediction_time.skb.eval()[test_idx]
print(
f"Train: {train_datetimes.shape[0]} rows, "
f"Test: {test_datetimes.shape[0]} rows"
)
print(f"Train time range: {train_datetimes[0, 0]} to " f"{train_datetimes[-1, 0]} ")
print(f"Test time range: {test_datetimes[0, 0]} to " f"{test_datetimes[-1, 0]} ")
print()
CV iteration #0
Train: 16224 rows, Test: 4032 rows
Train time range: 2021-03-30 00:00:00+00:00 to 2023-02-03 23:00:00+00:00
Test time range: 2023-02-11 00:00:00+00:00 to 2023-07-28 23:00:00+00:00
CV iteration #1
Train: 17472 rows, Test: 4032 rows
Train time range: 2021-07-24 00:00:00+00:00 to 2023-07-21 23:00:00+00:00
Test time range: 2023-07-29 00:00:00+00:00 to 2024-01-12 23:00:00+00:00
CV iteration #2
Train: 17472 rows, Test: 4032 rows
Train time range: 2022-01-08 00:00:00+00:00 to 2024-01-05 23:00:00+00:00
Test time range: 2024-01-13 00:00:00+00:00 to 2024-06-28 23:00:00+00:00
CV iteration #3
Train: 17472 rows, Test: 4032 rows
Train time range: 2022-06-25 00:00:00+00:00 to 2024-06-21 23:00:00+00:00
Test time range: 2024-06-29 00:00:00+00:00 to 2024-12-13 23:00:00+00:00
CV iteration #4
Train: 17472 rows, Test: 4032 rows
Train time range: 2022-12-10 00:00:00+00:00 to 2024-12-06 23:00:00+00:00
Test time range: 2024-12-14 00:00:00+00:00 to 2025-05-30 23:00:00+00:00
Once the cross-validation strategy is defined, we pass it to the cross_validate
function provided by skrub to compute the cross-validated scores. Here, we define
the mean absolute percentage error that is interpretable. However, this metric is
not a proper scoring rule. We therefore look at the R2 score and the Tweedie deviance
score.
from sklearn.metrics import make_scorer, mean_absolute_percentage_error, get_scorer
from sklearn.metrics import d2_tweedie_score
hgbr_cv_results = hgbr_predictions.skb.cross_validate(
cv=ts_cv_5,
scoring={
"mape": make_scorer(mean_absolute_percentage_error),
"r2": get_scorer("r2"),
"d2_poisson": make_scorer(d2_tweedie_score, power=1.0),
"d2_gamma": make_scorer(d2_tweedie_score, power=2.0),
},
return_train_score=True,
return_pipeline=True,
verbose=1,
n_jobs=-1,
)
hgbr_cv_results.round(3)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 8.4s finished
| fit_time | score_time | test_mape | train_mape | test_r2 | train_r2 | test_d2_poisson | train_d2_poisson | test_d2_gamma | train_d2_gamma | pipeline | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2.947 | 0.062 | 0.027 | 0.012 | 0.963 | 0.994 | 0.962 | 0.994 | 0.961 | 0.994 | SkrubPipeline(expr=<Apply HistGradientBoosting... |
| 1 | 3.237 | 0.062 | 0.024 | 0.013 | 0.978 | 0.994 | 0.977 | 0.994 | 0.976 | 0.993 | SkrubPipeline(expr=<Apply HistGradientBoosting... |
| 2 | 3.213 | 0.062 | 0.023 | 0.014 | 0.974 | 0.993 | 0.974 | 0.993 | 0.975 | 0.992 | SkrubPipeline(expr=<Apply HistGradientBoosting... |
| 3 | 3.233 | 0.057 | 0.019 | 0.014 | 0.980 | 0.993 | 0.980 | 0.992 | 0.980 | 0.992 | SkrubPipeline(expr=<Apply HistGradientBoosting... |
| 4 | 2.124 | 0.037 | 0.023 | 0.014 | 0.977 | 0.993 | 0.978 | 0.992 | 0.978 | 0.992 | SkrubPipeline(expr=<Apply HistGradientBoosting... |
TODO: comment the results obtained via cross-validation.
We further analyze our cross-validated model by collecting the predictions on each split.
hgbr_cv_predictions = collect_cv_predictions(
hgbr_cv_results["pipeline"], ts_cv_5, hgbr_predictions, prediction_time
)
hgbr_cv_predictions[0]
| prediction_time | load_mw | predicted_load_mw |
|---|---|---|
| datetime[μs, UTC] | f64 | f64 |
| 2023-02-11 00:00:00 UTC | 59258.0 | 59855.334418 |
| 2023-02-11 01:00:00 UTC | 58654.0 | 59958.654564 |
| 2023-02-11 02:00:00 UTC | 56155.0 | 57666.184522 |
| 2023-02-11 03:00:00 UTC | 54463.0 | 55832.880673 |
| 2023-02-11 04:00:00 UTC | 54698.0 | 57121.984097 |
| … | … | … |
| 2023-07-28 19:00:00 UTC | 38781.0 | 40093.987086 |
| 2023-07-28 20:00:00 UTC | 38455.0 | 39343.771368 |
| 2023-07-28 21:00:00 UTC | 39972.0 | 40738.151594 |
| 2023-07-28 22:00:00 UTC | 39825.0 | 39449.468131 |
| 2023-07-28 23:00:00 UTC | 36822.0 | 35828.293662 |
The first curve is called the Lorenz curve. It shows on the x-axis the fraction of observations sorted by predicted values and on the y-axis the cumulative observed load proportion.
plot_lorenz_curve(hgbr_cv_predictions).interactive()
The diagonal on the plot corresponds to a model predicting a constant value that is therefore not an informative model. The oracle model corresponds to the “perfect” model that would provide the an output identical to the observed values. Thus, the ranking of such hypothetical model is the best possible ranking. However, you should note that the oracle model is not the line passing through the right-hand corner of the plot. Instead, this curvature is defined by the distribution of the observations. Indeed, more the observations are composed of small values and a couple of large values, the more the oracle model is closer to the right-hand corner of the plot.
A true model is navigating between the diagonal and the oracle model. The area between the diagonal and the Lorenz curve of a model is called the Gini index.
For our model, we observe that each oracle model is not far from the diagonal. It means that the observed values do not contain a couple of large values with high variability. Therefore, it informs us that the complexity of our problem at hand is not too high. Looking at the Lorenz curve of each model, we observe that it is quite close to the oracle model. Therefore, the gradient boosting regressor is discriminative for our task.
Then, we have a look at the reliability diagram. This diagram shows on the x-axis the mean predicted load and on the y-axis the mean observed load.
plot_reliability_diagram(hgbr_cv_predictions).interactive().properties(
title="Reliability diagram from cross-validation predictions"
)
The diagonal on the reliability diagram corresponds to the best possible model: for a level of predicted load that fall into a bin, then the mean observed load is also in the same bin. If the line is above the diagonal, it means that our model is predicted a value too low in comparison to the observed values. If the line is below the diagonal, it means that our model is predicted a value too high in comparison to the observed values.
For our cross-validated model, we observe that each reliability curve is close to the diagonal. We only observe a mis-calibration for the extremum values.
plot_residuals_vs_predicted(hgbr_cv_predictions).interactive().properties(
title="Residuals vs Predicted Values from cross-validation predictions"
)
plot_binned_residuals(hgbr_cv_predictions, by="hour").interactive().properties(
title="Residuals by hour of the day from cross-validation predictions"
)
plot_binned_residuals(hgbr_cv_predictions, by="month").interactive().properties(
title="Residuals by hour of the day from cross-validation predictions"
)
ts_cv_2 = TimeSeriesSplit(
n_splits=2, test_size=test_size, max_train_size=max_train_size, gap=24
)
# randomized_search_hgbr = hgbr_predictions.skb.get_randomized_search(
# cv=ts_cv_2,
# scoring="r2",
# n_iter=100,
# fitted=True,
# verbose=1,
# n_jobs=-1,
# )
# # %%
# randomized_search_hgbr.results_.round(3)
# fig = randomized_search_hgbr.plot_results().update_layout(margin=dict(l=200))
# write_json(fig, "parallel_coordinates_hgbr.json")
fig = read_json("parallel_coordinates_hgbr.json")
fig.update_layout(margin=dict(l=200))
# nested_cv_results = skrub.cross_validate(
# environment=predictions.skb.get_data(),
# pipeline=randomized_search,
# cv=ts_cv_5,
# scoring={
# "r2": get_scorer("r2"),
# "mape": make_scorer(mean_absolute_percentage_error),
# },
# n_jobs=-1,
# return_pipeline=True,
# ).round(3)
# nested_cv_results
# for outer_fold_idx in range(len(nested_cv_results)):
# print(
# nested_cv_results.loc[outer_fold_idx, "pipeline"]
# .results_.loc[0]
# .round(3)
# .to_dict()
# )
Exercise: non-linear feature engineering coupled with linear predictive model#
Now, it is your turn to make a predictive model. Towards this end, we request you to preprocess the input features with non-linear feature engineering:
the first step is to impute the missing values using a
SimpleImputer. Make sure to include the indicator of missing values in the feature set (i.e. look at theadd_indicatorparameter);use a
SplineTransformerto create non-linear features. Use the default parameters but make sure to setsparse_output=Truesince it subsequent processing will be faster and more memory efficient with such data structure;use a
VarianceThresholdto remove features with potential constant features;use a
SelectKBestto select the most informative features. Setkto be chosen from a log-uniform distribution between 100 and 1,000 (i.e. useskrub.choose_int);use a
Nystroemto approximate an RBF kernel. Setn_componentsto be chosen from a log-uniform distribution between 10 and 200 (i.e. useskrub.choose_int).finally, use a
Ridgeas the final predictive model. Setalphato be chosen from a log-uniform distribution between 1e-6 and 1e3 (i.e. useskrub.choose_float).
Use a scikit-learn Pipeline using make_pipeline to chain the steps together.
Once the predictive model is defined, apply it on the feature_with_dropped_cols
expression. Do not forget to define that target is the y variable.
# Here we provide all the imports for creating the predictive model.
from sklearn.feature_selection import SelectKBest, VarianceThreshold
from sklearn.impute import SimpleImputer
from sklearn.linear_model import Ridge
from sklearn.kernel_approximation import Nystroem
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import SplineTransformer
# Write your code here.
#
#
#
#
#
#
#
#
#
#
#
predictions_ridge = features_with_dropped_cols.skb.apply(
make_pipeline(
SimpleImputer(add_indicator=True),
SplineTransformer(sparse_output=True),
VarianceThreshold(threshold=1e-6),
SelectKBest(
k=skrub.choose_int(100, 1_000, log=True, name="n_selected_splines")
),
Nystroem(
n_components=skrub.choose_int(
10, 200, log=True, name="n_components", default=150
)
),
Ridge(
alpha=skrub.choose_float(1e-6, 1e3, log=True, name="alpha", default=1e-2)
),
),
y=target,
)
predictions_ridge
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/impute/_base.py:637: UserWarning:
Skipping features without any observed values: ['weather_soil_moisture_1_to_3cm_paris_future_1h'
'weather_soil_moisture_1_to_3cm_paris_future_24h'
'weather_soil_moisture_1_to_3cm_lyon_future_1h'
'weather_soil_moisture_1_to_3cm_lyon_future_24h'
'weather_soil_moisture_1_to_3cm_marseille_future_1h'
'weather_soil_moisture_1_to_3cm_marseille_future_24h'
'weather_soil_moisture_1_to_3cm_toulouse_future_1h'
'weather_soil_moisture_1_to_3cm_toulouse_future_24h'
'weather_soil_moisture_1_to_3cm_lille_future_1h'
'weather_soil_moisture_1_to_3cm_lille_future_24h'
'weather_soil_moisture_1_to_3cm_limoges_future_1h'
'weather_soil_moisture_1_to_3cm_limoges_future_24h'
'weather_soil_moisture_1_to_3cm_nantes_future_1h'
'weather_soil_moisture_1_to_3cm_nantes_future_24h'
'weather_soil_moisture_1_to_3cm_strasbourg_future_1h'
'weather_soil_moisture_1_to_3cm_strasbourg_future_24h'
'weather_soil_moisture_1_to_3cm_brest_future_1h'
'weather_soil_moisture_1_to_3cm_brest_future_24h'
'weather_soil_moisture_1_to_3cm_bayonne_future_1h'
'weather_soil_moisture_1_to_3cm_bayonne_future_24h']. At least one non-missing value is needed for imputation with strategy='mean'.
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning:
divide by zero encountered in divide
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/impute/_base.py:637: UserWarning:
Skipping features without any observed values: ['weather_soil_moisture_1_to_3cm_paris_future_1h'
'weather_soil_moisture_1_to_3cm_paris_future_24h'
'weather_soil_moisture_1_to_3cm_lyon_future_1h'
'weather_soil_moisture_1_to_3cm_lyon_future_24h'
'weather_soil_moisture_1_to_3cm_marseille_future_1h'
'weather_soil_moisture_1_to_3cm_marseille_future_24h'
'weather_soil_moisture_1_to_3cm_toulouse_future_1h'
'weather_soil_moisture_1_to_3cm_toulouse_future_24h'
'weather_soil_moisture_1_to_3cm_lille_future_1h'
'weather_soil_moisture_1_to_3cm_lille_future_24h'
'weather_soil_moisture_1_to_3cm_limoges_future_1h'
'weather_soil_moisture_1_to_3cm_limoges_future_24h'
'weather_soil_moisture_1_to_3cm_nantes_future_1h'
'weather_soil_moisture_1_to_3cm_nantes_future_24h'
'weather_soil_moisture_1_to_3cm_strasbourg_future_1h'
'weather_soil_moisture_1_to_3cm_strasbourg_future_24h'
'weather_soil_moisture_1_to_3cm_brest_future_1h'
'weather_soil_moisture_1_to_3cm_brest_future_24h'
'weather_soil_moisture_1_to_3cm_bayonne_future_1h'
'weather_soil_moisture_1_to_3cm_bayonne_future_24h']. At least one non-missing value is needed for imputation with strategy='mean'.
Show graph
| load_mw_horizon_24h |
|---|
| 46014.51909883921 |
| 43945.23868230855 |
| 43382.416237267855 |
| 42454.81567581224 |
| 46998.72516029906 |
| 55372.42855439137 |
| 52080.78351551673 |
| 49906.38457947337 |
| 49198.7569053052 |
| 46834.45195959956 |
load_mw_horizon_24h
Float64- Null values
- 0 (0.0%)
- Unique values
- 1,000 (100.0%)
- Mean ± Std
- 5.07e+04 ± 5.91e+03
- Median ± IQR
- 5.05e+04 ± 7.83e+03
- Min | Max
- 3.42e+04 | 6.91e+04
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
| Column | Column name | dtype | Null values | Unique values | Mean | Std | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|
| 0 | load_mw_horizon_24h | Float64 | 0 (0.0%) | 1000 (100.0%) | 5.07e+04 | 5.91e+03 | 3.42e+04 | 5.05e+04 | 6.91e+04 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
Now that you defined the predictive model, let’s make a similar analysis than earlier. First, let’s make a sanity check that plot forecast of our model on a subset of the training data to make a sanity check.
# Write your code here.
#
#
#
#
#
#
#
#
#
#
#
altair.Chart(
pl.concat(
[
targets.skb.preview(),
predictions_ridge.rename(
{target_column_name: predicted_target_column_name}
).skb.preview(),
],
how="horizontal",
).tail(24 * 7)
).transform_fold(
[target_column_name, predicted_target_column_name],
).mark_line(
tooltip=True
).encode(
x="prediction_time:T", y="value:Q", color="key:N"
).interactive()
Now, let’s evaluate the performance of the model using cross-validation. Use the
time-based cross-validation splitter ts_cv_5 defined earlier. Make sure to compute
the R2 score and the mean absolute percentage error. Return the training scores as
well as the fitted pipeline such that we can make additional analysis.
Does this model perform better or worse than the previous model? Is it underfitting or overfitting?
# Write your code here.
#
#
#
#
#
#
#
#
#
#
#
cv_results_ridge = predictions_ridge.skb.cross_validate(
cv=ts_cv_5,
scoring={
"r2": get_scorer("r2"),
"mape": make_scorer(mean_absolute_percentage_error),
},
return_train_score=True,
return_pipeline=True,
verbose=1,
n_jobs=-1,
)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning: divide by zero encountered in divide
f = msb / msw
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning: divide by zero encountered in divide
f = msb / msw
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning: divide by zero encountered in divide
f = msb / msw
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning: divide by zero encountered in divide
f = msb / msw
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:782: UserWarning:
A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/sklearn/feature_selection/_univariate_selection.py:111: RuntimeWarning: divide by zero encountered in divide
f = msb / msw
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 25.3s finished
Compute all cross-validated predictions to plot the Lorenz curve and the reliability diagram for this pipeline.
To do so, you can use the function collect_cv_predictions to collect the
predictions and then call the plot_lorenz_curve and
plot_reliability_diagram functions to plot the results.
# Write your code here.
#
#
#
#
#
#
#
#
#
#
#
cv_predictions_ridge = collect_cv_predictions(
cv_results_ridge["pipeline"], ts_cv_5, predictions_ridge, prediction_time
)
plot_lorenz_curve(cv_predictions_ridge).interactive()
plot_reliability_diagram(cv_predictions_ridge).interactive().properties(
title="Reliability diagram from cross-validation predictions"
)
Now, let’s perform a randomized search on the hyper-parameters of the model. The code to perform the search is shown below. Since it will be pretty computationally expensive, we are reloading the results of the parallel coordinates plot.
# randomized_search_ridge = predictions_ridge.skb.get_randomized_search(
# cv=ts_cv_2,
# scoring="r2",
# n_iter=100,
# fitted=True,
# verbose=1,
# n_jobs=-1,
# )
# fig = randomized_search_ridge.plot_results().update_layout(margin=dict(l=200))
# write_json(fig, "parallel_coordinates_ridge.json")
fig = read_json("parallel_coordinates_ridge.json")
fig.update_layout(margin=dict(l=200))
We observe that the default values of the hyper-parameters are in the optimal region explored by the randomized search. This is a good sign that the model is well-specified and that the hyper-parameters are not too sensitive to small changes of those values.
We could further assess the stability of those optimal hyper-parameters by running a nested cross-validation, where we would perform a randomized search for each fold of the outer cross-validation loop as below but this is computationally expensive.
# nested_cv_results_ridge = skrub.cross_validate(
# environment=predictions_ridge.skb.get_data(),
# pipeline=randomized_search_ridge,
# cv=ts_cv_5,
# scoring={
# "r2": get_scorer("r2"),
# "mape": make_scorer(mean_absolute_percentage_error),
# },
# n_jobs=-1,
# return_pipeline=True,
# ).round(3)
# nested_cv_results_ridge.round(3)
Predicting multiple horizons with a multi-output model#
Usually, it is really common to predict values for multiple horizons at once. The most
naive approach is to train as many models as there are horizons. To achieve this,
scikit-learn provides a meta-estimator called MultiOutputRegressor that can be used
to train a single model that predicts multiple horizons at once.
In short, we only need to provide multiple targets where each column corresponds to an horizon and this meta-estimator will train an independent model for each column. However, we could expect that the quality of the forecast might degrade as the horizon increases.
Let’s train a gradient boosting regressor for each horizon.
from sklearn.multioutput import MultiOutputRegressor
multioutput_predictions = features_with_dropped_cols.skb.apply(
MultiOutputRegressor(
estimator=HistGradientBoostingRegressor(random_state=0), n_jobs=-1
),
y=targets.skb.drop(cols=["prediction_time", "load_mw"]).skb.mark_as_y(),
)
Now, let’s just rename the columns for the predictions to make it easier to plot the horizon forecast.
target_column_names = [target_column_name_pattern.format(horizon=h) for h in horizons]
predicted_target_column_names = [
f"predicted_{target_column_name}" for target_column_name in target_column_names
]
named_predictions = multioutput_predictions.rename(
{k: v for k, v in zip(target_column_names, predicted_target_column_names)}
)
Let’s plot the horizon forecast on a training data to check the validity of the output.
plot_at_time = datetime.datetime(2021, 4, 19, 0, 0, tzinfo=datetime.timezone.utc)
plot_horizon_forecast(
targets,
named_predictions,
plot_at_time,
target_column_name_pattern,
).skb.preview()
On this curve, the red line corresponds to the observed values past to the the date for which we would like to forecast. The orange line corresponds to the observed values for the next 24 hours and the blue line corresponds to the predicted values for the next 24 hours.
Since we are using a strong model and very few training data to check the validity we observe that our model perfectly fits the training data.
So, we are now ready to assess the performance of this multi-output model and we need
to cross-validate it. Since we do not want to aggregate the metrics for the different
horizons, we need to create a scikit-learn scorer in which we set
multioutput="raw_values" to get the scores for each horizon.
Passing this scorer to the cross_validate function returns all horizons scores.
from sklearn.metrics import r2_score
def multioutput_scorer(regressor, X, y, score_func, score_name):
y_pred = regressor.predict(X)
return {
f"{score_name}_horizon_{h}h": score
for h, score in enumerate(
score_func(y, y_pred, multioutput="raw_values"), start=1
)
}
def scoring(regressor, X, y):
return {
**multioutput_scorer(regressor, X, y, mean_absolute_percentage_error, "mape"),
**multioutput_scorer(regressor, X, y, r2_score, "r2"),
}
multioutput_cv_results = multioutput_predictions.skb.cross_validate(
cv=ts_cv_5,
scoring=scoring,
return_train_score=True,
verbose=1,
n_jobs=-1,
)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:782: UserWarning:
A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 1.9min finished
One thing that we observe is that training such multi-output model is expensive. It is expected since each horizon involves a different model and thus a training.
multioutput_cv_results.round(3)
| fit_time | score_time | test_mape_horizon_1h | train_mape_horizon_1h | test_mape_horizon_2h | train_mape_horizon_2h | test_mape_horizon_3h | train_mape_horizon_3h | test_mape_horizon_4h | train_mape_horizon_4h | ... | test_r2_horizon_20h | train_r2_horizon_20h | test_r2_horizon_21h | train_r2_horizon_21h | test_r2_horizon_22h | train_r2_horizon_22h | test_r2_horizon_23h | train_r2_horizon_23h | test_r2_horizon_24h | train_r2_horizon_24h | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 74.260 | 1.792 | 0.013 | 0.007 | 0.020 | 0.011 | 0.025 | 0.012 | 0.027 | 0.013 | ... | 0.949 | 0.993 | 0.958 | 0.994 | 0.959 | 0.994 | 0.962 | 0.995 | 0.963 | 0.995 |
| 1 | 81.277 | 1.973 | 0.016 | 0.008 | 0.025 | 0.012 | 0.029 | 0.014 | 0.030 | 0.014 | ... | 0.970 | 0.992 | 0.974 | 0.993 | 0.975 | 0.994 | 0.977 | 0.994 | 0.977 | 0.994 |
| 2 | 81.816 | 1.712 | 0.016 | 0.009 | 0.023 | 0.013 | 0.025 | 0.015 | 0.027 | 0.015 | ... | 0.960 | 0.991 | 0.970 | 0.993 | 0.971 | 0.993 | 0.973 | 0.993 | 0.973 | 0.993 |
| 3 | 81.270 | 1.509 | 0.012 | 0.010 | 0.018 | 0.014 | 0.021 | 0.015 | 0.023 | 0.016 | ... | 0.973 | 0.989 | 0.977 | 0.992 | 0.980 | 0.993 | 0.979 | 0.993 | 0.979 | 0.993 |
| 4 | 20.298 | 0.462 | 0.013 | 0.010 | 0.019 | 0.014 | 0.023 | 0.016 | 0.026 | 0.016 | ... | 0.969 | 0.989 | 0.974 | 0.992 | 0.976 | 0.993 | 0.978 | 0.993 | 0.978 | 0.993 |
5 rows × 98 columns
Instead of reading the results in the table, we can plot the scores depending on the type of data and the metric.
import itertools
from IPython.display import display
for metric_name, dataset_type in itertools.product(["mape", "r2"], ["train", "test"]):
columns = multioutput_cv_results.columns[
multioutput_cv_results.columns.str.startswith(f"{dataset_type}_{metric_name}")
]
data_to_plot = multioutput_cv_results[columns]
data_to_plot.columns = [
col.replace(f"{dataset_type}_", "")
.replace(f"{metric_name}_", "")
.replace("_", " ")
for col in columns
]
data_long = data_to_plot.melt(var_name="horizon", value_name="score")
chart = (
altair.Chart(
data_long,
title=f"{dataset_type.title()} {metric_name.upper()} scores by horizon",
)
.mark_boxplot(extent="min-max")
.encode(
x=altair.X(
"horizon:N",
title="Horizon",
sort=altair.Sort(
[f"horizon {h}h" for h in range(1, data_to_plot.shape[1])]
),
),
y=altair.Y("score:Q", title=f"{metric_name.upper()} Score"),
color=altair.Color("horizon:N", legend=None),
)
)
display(chart)
An interesting and unexpected observation is that the MAPE error on the test data is first increases and then decreases once past the horizon 18h. We would not necessarily expect this behaviour.
Native multi-output handling using RandomForestRegressor#
In the previous section, we showed how to wrap a HistGradientBoostingRegressor
in a MultiOutputRegressor to predict multiple horizons. With such a strategy, it
means that we trained independent HistGradientBoostingRegressor, one for each
horizon.
RandomForestRegressor natively supports multi-output regression: instead of
independently training a model per horizon, it will train a joint model that
predicts all horizons at once.
Repeat the previous analysis using a RandomForestRegressor. Fix the parameter
min_samples_leaf to 5.
Once you created the model, plot the horizon forecast for a given date and time. In addition, compute the cross-validated predictions and plot the R2 and MAPE scores for each horizon.
Does this model perform better or worse than the previous model?
from sklearn.ensemble import RandomForestRegressor
# Write your code here.
#
#
#
#
#
#
#
#
#
#
#
multioutput_predictions_rf = features_with_dropped_cols.skb.apply(
RandomForestRegressor(min_samples_leaf=5, random_state=0, n_jobs=-1),
y=targets.skb.drop(cols=["prediction_time", "load_mw"]).skb.mark_as_y(),
)
named_predictions_rf = multioutput_predictions_rf.rename(
{k: v for k, v in zip(target_column_names, predicted_target_column_names)}
)
plot_at_time = datetime.datetime(2021, 4, 24, 0, 0, tzinfo=datetime.timezone.utc)
plot_horizon_forecast(
targets,
named_predictions_rf,
plot_at_time,
target_column_name_pattern,
).skb.preview()
multioutput_cv_results_rf = multioutput_predictions_rf.skb.cross_validate(
cv=ts_cv_5,
scoring=scoring,
return_train_score=True,
verbose=1,
n_jobs=-1,
)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
/home/runner/work/forecasting/forecasting/.pixi/envs/doc/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:782: UserWarning:
A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 3.2min finished
multioutput_cv_results_rf.round(3)
| fit_time | score_time | test_mape_horizon_1h | train_mape_horizon_1h | test_mape_horizon_2h | train_mape_horizon_2h | test_mape_horizon_3h | train_mape_horizon_3h | test_mape_horizon_4h | train_mape_horizon_4h | ... | test_r2_horizon_20h | train_r2_horizon_20h | test_r2_horizon_21h | train_r2_horizon_21h | test_r2_horizon_22h | train_r2_horizon_22h | test_r2_horizon_23h | train_r2_horizon_23h | test_r2_horizon_24h | train_r2_horizon_24h | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 125.504 | 0.265 | 0.024 | 0.013 | 0.027 | 0.014 | 0.029 | 0.014 | 0.031 | 0.015 | ... | 0.918 | 0.986 | 0.918 | 0.986 | 0.916 | 0.985 | 0.913 | 0.984 | 0.908 | 0.982 |
| 1 | 144.940 | 0.424 | 0.029 | 0.014 | 0.034 | 0.015 | 0.036 | 0.015 | 0.037 | 0.016 | ... | 0.934 | 0.986 | 0.937 | 0.986 | 0.938 | 0.985 | 0.938 | 0.984 | 0.936 | 0.982 |
| 2 | 153.950 | 0.294 | 0.023 | 0.014 | 0.027 | 0.015 | 0.029 | 0.016 | 0.030 | 0.016 | ... | 0.929 | 0.986 | 0.933 | 0.986 | 0.934 | 0.985 | 0.934 | 0.983 | 0.931 | 0.982 |
| 3 | 155.641 | 0.237 | 0.019 | 0.014 | 0.022 | 0.016 | 0.023 | 0.016 | 0.025 | 0.017 | ... | 0.931 | 0.984 | 0.933 | 0.984 | 0.933 | 0.983 | 0.932 | 0.982 | 0.930 | 0.980 |
| 4 | 56.551 | 0.085 | 0.025 | 0.014 | 0.028 | 0.016 | 0.030 | 0.016 | 0.032 | 0.017 | ... | 0.923 | 0.984 | 0.925 | 0.985 | 0.926 | 0.984 | 0.926 | 0.982 | 0.923 | 0.980 |
5 rows × 98 columns
import itertools
from IPython.display import display
for metric_name, dataset_type in itertools.product(["mape", "r2"], ["train", "test"]):
columns = multioutput_cv_results_rf.columns[
multioutput_cv_results_rf.columns.str.startswith(
f"{dataset_type}_{metric_name}"
)
]
data_to_plot = multioutput_cv_results_rf[columns]
data_to_plot.columns = [
col.replace(f"{dataset_type}_", "")
.replace(f"{metric_name}_", "")
.replace("_", " ")
for col in columns
]
data_long = data_to_plot.melt(var_name="horizon", value_name="score")
chart = (
altair.Chart(
data_long,
title=f"{dataset_type.title()} {metric_name.upper()} Scores by Horizon",
)
.mark_boxplot(extent="min-max")
.encode(
x=altair.X(
"horizon:N",
title="Horizon",
sort=altair.Sort(
[f"horizon {h}h" for h in range(1, data_to_plot.shape[1])]
),
),
y=altair.Y("score:Q", title=f"{metric_name.upper()} Score"),
color=altair.Color("horizon:N", legend=None),
)
)
display(chart)
We observe that the performance of the RandomForestRegressor is not better in terms
of scores or computational cost. The trend of the scores along the horizon is also
different from the HistGradientBoostingRegressor: the scores worsen as the horizon
increases.
Uncertainty quantification using quantile regression#
In this section, we show how one can use a gradient boosting but modify the loss function to predict different quantiles and thus obtain an uncertainty quantification of the predictions.
In terms of evaluation, we reuse the R2 and MAPE scores. However, they are not helpful to assess the reliability of quantile models. For this purpose, we use a derivate of the metric minimize by those models: the pinball loss. We use the D2 score that is easier to interpret since the best possible score is bounded by 1 and a score of 0 corresponds to constant predictions at the target quantile.
from sklearn.metrics import d2_pinball_score
scoring = {
"r2": get_scorer("r2"),
"mape": make_scorer(mean_absolute_percentage_error),
"d2_pinball_05": make_scorer(d2_pinball_score, alpha=0.05),
"d2_pinball_50": make_scorer(d2_pinball_score, alpha=0.50),
"d2_pinball_95": make_scorer(d2_pinball_score, alpha=0.95),
}
We know define three different models:
a model predicting the 5th percentile of the load
a model predicting the median of the load
a model predicting the 95th percentile of the load
common_params = dict(
loss="quantile", learning_rate=0.1, max_leaf_nodes=100, random_state=0
)
predictions_hgbr_05 = features_with_dropped_cols.skb.apply(
HistGradientBoostingRegressor(**common_params, quantile=0.05),
y=target,
)
predictions_hgbr_50 = features_with_dropped_cols.skb.apply(
HistGradientBoostingRegressor(**common_params, quantile=0.5),
y=target,
)
predictions_hgbr_95 = features_with_dropped_cols.skb.apply(
HistGradientBoostingRegressor(**common_params, quantile=0.95),
y=target,
)
Finally, we cross-validate each models and compute the above scores.
cv_results_hgbr_05 = predictions_hgbr_05.skb.cross_validate(
cv=ts_cv_5,
scoring=scoring,
return_pipeline=True,
verbose=1,
n_jobs=-1,
)
cv_results_hgbr_50 = predictions_hgbr_50.skb.cross_validate(
cv=ts_cv_5,
scoring=scoring,
return_pipeline=True,
verbose=1,
n_jobs=-1,
)
cv_results_hgbr_95 = predictions_hgbr_95.skb.cross_validate(
cv=ts_cv_5,
scoring=scoring,
return_pipeline=True,
verbose=1,
n_jobs=-1,
)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 10.4s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 12.6s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 10.3s finished
Let’s now show the test scores for each model.
cv_results_hgbr_05[
[col for col in cv_results_hgbr_05.columns if col.startswith("test_")]
].mean(axis=0).round(3)
test_r2 0.853
test_mape 0.051
test_d2_pinball_05 0.698
test_d2_pinball_50 0.644
test_d2_pinball_95 -1.250
dtype: float64
cv_results_hgbr_50[
[col for col in cv_results_hgbr_50.columns if col.startswith("test_")]
].mean(axis=0).round(3)
test_r2 0.970
test_mape 0.024
test_d2_pinball_05 0.180
test_d2_pinball_50 0.841
test_d2_pinball_95 0.494
dtype: float64
cv_results_hgbr_95[
[col for col in cv_results_hgbr_95.columns if col.startswith("test_")]
].mean(axis=0).round(3)
test_r2 0.848
test_mape 0.064
test_d2_pinball_05 -2.392
test_d2_pinball_50 0.611
test_d2_pinball_95 0.775
dtype: float64
Focusing on the different D2 scores, we observe that each model minimize the D2 score
associated to the target quantile that we set. For instance, the model predicting the
5th percentile obtained the highest D2 pinball score with alpha=0.05. It is expected
but a confirmation of what loss each model minimizes.
Now, let’s make a plot of the predictions for each model. Let’s first gather all the predictions in a single dataframe.
results = pl.concat(
[
targets.skb.select(cols=["prediction_time", target_column_name]).skb.preview(),
predictions_hgbr_05.rename({target_column_name: "quantile_05"}).skb.preview(),
predictions_hgbr_50.rename({target_column_name: "median"}).skb.preview(),
predictions_hgbr_95.rename({target_column_name: "quantile_95"}).skb.preview(),
],
how="horizontal",
).tail(24 * 7)
Now, we plot the observed values and the predicted median with a line. In addition, we plot the 5th and 95th percentiles as a shaded area. It means that between those two bounds, we expect to find 90% of the observed values.
median_chart = (
altair.Chart(results)
.transform_fold([target_column_name, "median"])
.mark_line(tooltip=True)
.encode(x="prediction_time:T", y="value:Q", color="key:N")
)
# Add a column for the band legend
results_with_band = results.with_columns(pl.lit("90% interval").alias("band_type"))
quantile_band_chart = (
altair.Chart(results_with_band)
.mark_area(opacity=0.4, tooltip=True)
.encode(
x="prediction_time:T",
y="quantile_05:Q",
y2="quantile_95:Q",
color=altair.Color("band_type:N", scale=altair.Scale(range=["lightgreen"])),
)
)
combined_chart = quantile_band_chart + median_chart
combined_chart.resolve_scale(color="independent").interactive()
cv_predictions_hgbr_05 = collect_cv_predictions(
cv_results_hgbr_05["pipeline"], ts_cv_5, predictions_hgbr_05, prediction_time
)
cv_predictions_hgbr_50 = collect_cv_predictions(
cv_results_hgbr_50["pipeline"], ts_cv_5, predictions_hgbr_50, prediction_time
)
cv_predictions_hgbr_95 = collect_cv_predictions(
cv_results_hgbr_95["pipeline"], ts_cv_5, predictions_hgbr_95, prediction_time
)
plot_residuals_vs_predicted(cv_predictions_hgbr_05).interactive().properties(
title=(
"Residuals vs Predicted Values from cross-validation predictions"
" for quantile 0.05"
)
)
plot_residuals_vs_predicted(cv_predictions_hgbr_50).interactive().properties(
title=(
"Residuals vs Predicted Values from cross-validation predictions" " for median"
)
)
plot_residuals_vs_predicted(cv_predictions_hgbr_95).interactive().properties(
title=(
"Residuals vs Predicted Values from cross-validation predictions"
" for quantile 0.95"
)
)
cv_predictions_hgbr_05_concat = pl.concat(cv_predictions_hgbr_05, how="vertical")
cv_predictions_hgbr_50_concat = pl.concat(cv_predictions_hgbr_50, how="vertical")
cv_predictions_hgbr_95_concat = pl.concat(cv_predictions_hgbr_95, how="vertical")
import matplotlib.pyplot as plt
from sklearn.metrics import PredictionErrorDisplay
for kind in ["actual_vs_predicted", "residual_vs_predicted"]:
fig, axs = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
PredictionErrorDisplay.from_predictions(
y_true=cv_predictions_hgbr_05_concat["load_mw"].to_numpy(),
y_pred=cv_predictions_hgbr_05_concat["predicted_load_mw"].to_numpy(),
kind=kind,
ax=axs[0],
)
axs[0].set_title("0.05 quantile regression")
PredictionErrorDisplay.from_predictions(
y_true=cv_predictions_hgbr_50_concat["load_mw"].to_numpy(),
y_pred=cv_predictions_hgbr_50_concat["predicted_load_mw"].to_numpy(),
kind=kind,
ax=axs[1],
)
axs[1].set_title("Median regression")
PredictionErrorDisplay.from_predictions(
y_true=cv_predictions_hgbr_95_concat["load_mw"].to_numpy(),
y_pred=cv_predictions_hgbr_95_concat["predicted_load_mw"].to_numpy(),
kind=kind,
ax=axs[2],
)
axs[2].set_title("0.95 quantile regression")
fig.suptitle(f"{kind} for GBRT minimzing different quantile losses")
def coverage(y_true, y_quantile_low, y_quantile_high):
y_true = np.asarray(y_true)
y_quantile_low = np.asarray(y_quantile_low)
y_quantile_high = np.asarray(y_quantile_high)
return float(
np.logical_and(y_true >= y_quantile_low, y_true <= y_quantile_high)
.mean()
.round(4)
)
def mean_width(y_true, y_quantile_low, y_quantile_high):
y_true = np.asarray(y_true)
y_quantile_low = np.asarray(y_quantile_low)
y_quantile_high = np.asarray(y_quantile_high)
return float(np.abs(y_quantile_high - y_quantile_low).mean().round(1))
coverage(
cv_predictions_hgbr_50_concat["load_mw"].to_numpy(),
cv_predictions_hgbr_05_concat["predicted_load_mw"].to_numpy(),
cv_predictions_hgbr_95_concat["predicted_load_mw"].to_numpy(),
)
mean_width(
cv_predictions_hgbr_50_concat["load_mw"].to_numpy(),
cv_predictions_hgbr_05_concat["predicted_load_mw"].to_numpy(),
cv_predictions_hgbr_95_concat["predicted_load_mw"].to_numpy(),
)
# Compute binned coverage scores
binned_coverage_results = binned_coverage(
[df["load_mw"].to_numpy() for df in cv_predictions_hgbr_50],
[df["predicted_load_mw"].to_numpy() for df in cv_predictions_hgbr_05],
[df["predicted_load_mw"].to_numpy() for df in cv_predictions_hgbr_95],
n_bins=10,
)
binned_coverage_results
| bin_left | bin_right | bin_center | fold_idx | coverage | mean_width | n_samples | |
|---|---|---|---|---|---|---|---|
| 0 | 28744.0 | 36884.0 | 32814.0 | 0 | 0.4205 | 4235.5 | 459 |
| 1 | 36886.0 | 40158.0 | 38522.0 | 0 | 0.6493 | 4291.2 | 499 |
| 2 | 40160.0 | 42982.0 | 41571.0 | 0 | 0.7117 | 4501.7 | 496 |
| 3 | 42983.0 | 45219.0 | 44101.0 | 0 | 0.7400 | 4281.1 | 473 |
| 4 | 45220.0 | 47332.0 | 46276.0 | 0 | 0.7401 | 4071.5 | 454 |
| 5 | 47334.0 | 49718.0 | 48526.0 | 0 | 0.7044 | 3868.6 | 477 |
| 6 | 49720.0 | 53126.0 | 51423.0 | 0 | 0.7418 | 4679.3 | 426 |
| 7 | 53129.0 | 57570.0 | 55349.5 | 0 | 0.8162 | 6144.5 | 321 |
| 8 | 57572.0 | 63173.0 | 60372.5 | 0 | 0.8687 | 7106.8 | 259 |
| 9 | 63176.0 | 86573.0 | 74874.5 | 0 | 0.8929 | 8807.9 | 168 |
| 10 | 28744.0 | 36884.0 | 32814.0 | 1 | 0.4637 | 3917.9 | 496 |
| 11 | 36886.0 | 40158.0 | 38522.0 | 1 | 0.7960 | 4557.4 | 402 |
| 12 | 40160.0 | 42982.0 | 41571.0 | 1 | 0.8550 | 4842.4 | 407 |
| 13 | 42983.0 | 45219.0 | 44101.0 | 1 | 0.8783 | 4546.3 | 378 |
| 14 | 45220.0 | 47332.0 | 46276.0 | 1 | 0.8939 | 4620.4 | 377 |
| 15 | 47334.0 | 49718.0 | 48526.0 | 1 | 0.8648 | 5101.1 | 392 |
| 16 | 49720.0 | 53126.0 | 51423.0 | 1 | 0.8450 | 5774.8 | 400 |
| 17 | 53129.0 | 57570.0 | 55349.5 | 1 | 0.8657 | 6392.7 | 402 |
| 18 | 57572.0 | 63173.0 | 60372.5 | 1 | 0.8005 | 6515.6 | 386 |
| 19 | 63176.0 | 86573.0 | 74874.5 | 1 | 0.6505 | 7550.9 | 392 |
| 20 | 28744.0 | 36884.0 | 32814.0 | 2 | 0.6887 | 4985.7 | 318 |
| 21 | 36886.0 | 40158.0 | 38522.0 | 2 | 0.7657 | 5111.9 | 367 |
| 22 | 40160.0 | 42982.0 | 41571.0 | 2 | 0.7731 | 5186.4 | 357 |
| 23 | 42983.0 | 45219.0 | 44101.0 | 2 | 0.8050 | 4952.7 | 400 |
| 24 | 45220.0 | 47332.0 | 46276.0 | 2 | 0.8814 | 4906.7 | 413 |
| 25 | 47334.0 | 49718.0 | 48526.0 | 2 | 0.8951 | 4984.1 | 410 |
| 26 | 49720.0 | 53126.0 | 51423.0 | 2 | 0.8716 | 5976.0 | 475 |
| 27 | 53129.0 | 57570.0 | 55349.5 | 2 | 0.8577 | 6568.6 | 534 |
| 28 | 57572.0 | 63173.0 | 60372.5 | 2 | 0.8386 | 6386.9 | 446 |
| 29 | 63176.0 | 86573.0 | 74874.5 | 2 | 0.6731 | 7098.5 | 312 |
| 30 | 28744.0 | 36884.0 | 32814.0 | 3 | 0.7563 | 4199.4 | 513 |
| 31 | 36886.0 | 40158.0 | 38522.0 | 3 | 0.8534 | 3682.2 | 498 |
| 32 | 40160.0 | 42982.0 | 41571.0 | 3 | 0.8358 | 3972.7 | 481 |
| 33 | 42983.0 | 45219.0 | 44101.0 | 3 | 0.8337 | 3750.7 | 457 |
| 34 | 45220.0 | 47332.0 | 46276.0 | 3 | 0.8767 | 3718.1 | 511 |
| 35 | 47334.0 | 49718.0 | 48526.0 | 3 | 0.7968 | 4168.4 | 497 |
| 36 | 49720.0 | 53126.0 | 51423.0 | 3 | 0.7019 | 4535.3 | 369 |
| 37 | 53129.0 | 57570.0 | 55349.5 | 3 | 0.8386 | 5236.3 | 254 |
| 38 | 57572.0 | 63173.0 | 60372.5 | 3 | 0.7041 | 5945.4 | 267 |
| 39 | 63176.0 | 86573.0 | 74874.5 | 3 | 0.3568 | 7160.5 | 185 |
| 40 | 28744.0 | 36884.0 | 32814.0 | 4 | 0.6217 | 4230.8 | 230 |
| 41 | 36886.0 | 40158.0 | 38522.0 | 4 | 0.7570 | 4435.0 | 251 |
| 42 | 40160.0 | 42982.0 | 41571.0 | 4 | 0.8723 | 4569.2 | 274 |
| 43 | 42983.0 | 45219.0 | 44101.0 | 4 | 0.8350 | 4213.3 | 309 |
| 44 | 45220.0 | 47332.0 | 46276.0 | 4 | 0.8385 | 4167.3 | 260 |
| 45 | 47334.0 | 49718.0 | 48526.0 | 4 | 0.7137 | 4547.9 | 241 |
| 46 | 49720.0 | 53126.0 | 51423.0 | 4 | 0.8324 | 5691.7 | 346 |
| 47 | 53129.0 | 57570.0 | 55349.5 | 4 | 0.8591 | 6481.9 | 504 |
| 48 | 57572.0 | 63173.0 | 60372.5 | 4 | 0.7857 | 6359.0 | 658 |
| 49 | 63176.0 | 86573.0 | 74874.5 | 4 | 0.5495 | 6589.4 | 959 |
coverage_by_bin = binned_coverage_results.copy()
coverage_by_bin["bin_label"] = coverage_by_bin.apply(
lambda row: f"[{row.bin_left:.0f}, {row.bin_right:.0f}]", axis=1
)
ax = coverage_by_bin.boxplot(
column="coverage", by="bin_label", figsize=(12, 6), vert=False, whis=1000
)
ax.axvline(x=0.9, color="red", linestyle="--", label="Target coverage (0.9)")
ax.set_xlabel("Load bins (MW)")
ax.set_ylabel("Coverage")
ax.set_title("Coverage Distribution by Load Bins")
ax.legend()
plt.suptitle("") # Remove automatic suptitle from boxplot
plt.xticks(rotation=45)
plt.tight_layout()
Reliability diagrams and Lorenz curves for quantile regression#
plot_reliability_diagram(
cv_predictions_hgbr_50, kind="quantile", quantile_level=0.50
).interactive().properties(
title="Reliability diagram for quantile 0.50 from cross-validation predictions"
)
plot_reliability_diagram(
cv_predictions_hgbr_05, kind="quantile", quantile_level=0.05
).interactive().properties(
title="Reliability diagram for quantile 0.05 from cross-validation predictions"
)
plot_reliability_diagram(
cv_predictions_hgbr_95, kind="quantile", quantile_level=0.95
).interactive().properties(
title="Reliability diagram for quantile 0.95 from cross-validation predictions"
)
plot_lorenz_curve(cv_predictions_hgbr_50).interactive().properties(
title="Lorenz curve for quantile 0.50 from cross-validation predictions"
)
plot_lorenz_curve(cv_predictions_hgbr_05).interactive().properties(
title="Lorenz curve for quantile 0.05 from cross-validation predictions"
)
plot_lorenz_curve(cv_predictions_hgbr_95).interactive().properties(
title="Lorenz curve for quantile 0.95 from cross-validation predictions"
)
Quantile regression as classification#
In the following, we turn a quantile regression problem for all possible quantile levels into a multiclass classification problem by discretizing the target variable into bins and interpolating the cumulative sum of the bin membership probability to estimate the CDF of the distribution of the continuous target variable conditioned on the features.
Ideally, the classifier should be efficient when trained on a large number of classes (induced by the number of bins). Therefore we use a Random Forest classifier as the default base estimator.
There are several advantages to this approach:
a single model is trained and can jointly estimate quantiles for all quantile levels (assuming a well tuned number of bins);
the quantile levels can be chosen at prediction time, which allows for a flexible quantile regression model;
in practice, the resulting predictions are often reasonably well calibrated as we will see in the reliability diagrams below.
One possible drawback is that current implementations of gradient boosting models tend to be very slow to train with a large number of classes. Random Forests are much more efficient in this case, but they do not always provide the best predictive performance. It could be the case that combining this approach with tabular neural networks can lead to competitive results.
However, the current scikit-learn API is not expressive enough to to handle the output shape of the quantile prediction function. We therefore cannot make it fit into a skrub pipeline.
from scipy.interpolate import interp1d
from sklearn.base import BaseEstimator, RegressorMixin, clone
from sklearn.utils.validation import check_is_fitted
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.utils.validation import check_consistent_length
from sklearn.utils import check_random_state
import numpy as np
class BinnedQuantileRegressor(BaseEstimator, RegressorMixin):
def __init__(
self,
estimator=None,
n_bins=100,
quantile=0.5,
random_state=None,
):
self.n_bins = n_bins
self.estimator = estimator
self.quantile = quantile
self.random_state = random_state
def fit(self, X, y):
# Lightweight input validation: most of the input validation will be
# handled by the sub estimators.
random_state = check_random_state(self.random_state)
check_consistent_length(X, y)
self.target_binner_ = KBinsDiscretizer(
n_bins=self.n_bins,
strategy="quantile",
subsample=200_000,
encode="ordinal",
quantile_method="averaged_inverted_cdf",
random_state=random_state,
)
y_binned = (
self.target_binner_.fit_transform(np.asarray(y).reshape(-1, 1))
.ravel()
.astype(np.int32)
)
# Fit the multiclass classifier to predict the binned targets from the
# training set.
if self.estimator is None:
estimator = RandomForestClassifier(random_state=random_state)
else:
estimator = clone(self.estimator)
self.estimator_ = estimator.fit(X, y_binned)
return self
def predict_quantiles(self, X, quantiles=(0.05, 0.5, 0.95)):
check_is_fitted(self, "estimator_")
edges = self.target_binner_.bin_edges_[0]
n_bins = edges.shape[0] - 1
expected_shape = (X.shape[0], n_bins)
y_proba_raw = self.estimator_.predict_proba(X)
# Some might stay empty on the training set. Typically, classifiers do
# not learn to predict an explicit 0 probability for unobserved classes
# so we have to post process their output:
if y_proba_raw.shape != expected_shape:
y_proba = np.zeros(shape=expected_shape)
y_proba[:, self.estimator_.classes_] = y_proba_raw
else:
y_proba = y_proba_raw
# Build the mapper for inverse CDF mapping, from cumulated
# probabilities to continuous prediction.
y_cdf = np.zeros(shape=(X.shape[0], edges.shape[0]))
y_cdf[:, 1:] = np.cumsum(y_proba, axis=1)
return np.asarray([interp1d(y_cdf_i, edges)(quantiles) for y_cdf_i in y_cdf])
def predict(self, X):
return self.predict_quantiles(X, quantiles=(self.quantile,)).ravel()
quantiles = (0.05, 0.5, 0.95)
bqr = BinnedQuantileRegressor(
RandomForestClassifier(
n_estimators=300,
min_samples_leaf=5,
max_features=0.2,
n_jobs=-1,
random_state=0,
),
n_bins=30,
)
bqr
BinnedQuantileRegressor(estimator=RandomForestClassifier(max_features=0.2,
min_samples_leaf=5,
n_estimators=300,
n_jobs=-1,
random_state=0),
n_bins=30)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| estimator | RandomForestC...andom_state=0) | |
| n_bins | 30 | |
| quantile | 0.5 | |
| random_state | None |
RandomForestClassifier(max_features=0.2, min_samples_leaf=5, n_estimators=300,
n_jobs=-1, random_state=0)Parameters
| n_estimators | 300 | |
| criterion | 'gini' | |
| max_depth | None | |
| min_samples_split | 2 | |
| min_samples_leaf | 5 | |
| min_weight_fraction_leaf | 0.0 | |
| max_features | 0.2 | |
| max_leaf_nodes | None | |
| min_impurity_decrease | 0.0 | |
| bootstrap | True | |
| oob_score | False | |
| n_jobs | -1 | |
| random_state | 0 | |
| verbose | 0 | |
| warm_start | False | |
| class_weight | None | |
| ccp_alpha | 0.0 | |
| max_samples | None | |
| monotonic_cst | None |
from sklearn.model_selection import cross_validate
X, y = features_with_dropped_cols.skb.eval(), target.skb.eval()
cv_results_bqr = cross_validate(
bqr,
X,
y,
cv=ts_cv_5,
scoring={
"d2_pinball_50": make_scorer(d2_pinball_score, alpha=0.5),
},
return_estimator=True,
return_indices=True,
verbose=1,
n_jobs=-1,
)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 2.1min finished
cv_predictions_bqr_all = [
cv_predictions_bqr_05 := [],
cv_predictions_bqr_50 := [],
cv_predictions_bqr_95 := [],
]
for fold_ix, (qreg, test_idx) in enumerate(
zip(cv_results_bqr["estimator"], cv_results_bqr["indices"]["test"])
):
print(f"CV iteration #{fold_ix}")
print(f"Test set size: {test_idx.shape[0]} rows")
print(
f"Test time range: {prediction_time.skb.eval()[test_idx][0, 0]} to "
f"{prediction_time.skb.eval()[test_idx][-1, 0]} "
)
y_pred_all_quantiles = qreg.predict_quantiles(X[test_idx], quantiles=quantiles)
coverage_score = coverage(
y[test_idx],
y_pred_all_quantiles[:, 0],
y_pred_all_quantiles[:, 2],
)
print(f"Coverage: {coverage_score:.3f}")
mean_width_score = mean_width(
y[test_idx],
y_pred_all_quantiles[:, 0],
y_pred_all_quantiles[:, 2],
)
print(f"Mean prediction interval width: " f"{mean_width_score:.1f} MW")
for q_idx, (quantile, predictions) in enumerate(
zip(quantiles, cv_predictions_bqr_all)
):
observed = y[test_idx]
predicted = y_pred_all_quantiles[:, q_idx]
predictions.append(
pl.DataFrame(
{
"prediction_time": prediction_time.skb.eval()[test_idx],
"load_mw": observed,
"predicted_load_mw": predicted,
}
)
)
print(f"d2_pinball score: {d2_pinball_score(observed, predicted):.3f}")
print()
CV iteration #0
Test set size: 4032 rows
Test time range: 2023-02-11 00:00:00+00:00 to 2023-07-28 23:00:00+00:00
Coverage: 0.965
Mean prediction interval width: 10778.0 MW
d2_pinball score: 0.346
d2_pinball score: 0.735
d2_pinball score: -0.025
CV iteration #1
Test set size: 4032 rows
Test time range: 2023-07-29 00:00:00+00:00 to 2024-01-12 23:00:00+00:00
Coverage: 0.992
Mean prediction interval width: 11109.8 MW
d2_pinball score: 0.360
d2_pinball score: 0.813
d2_pinball score: 0.275
CV iteration #2
Test set size: 4032 rows
Test time range: 2024-01-13 00:00:00+00:00 to 2024-06-28 23:00:00+00:00
Coverage: 0.988
Mean prediction interval width: 11762.3 MW
d2_pinball score: 0.292
d2_pinball score: 0.789
d2_pinball score: 0.108
CV iteration #3
Test set size: 4032 rows
Test time range: 2024-06-29 00:00:00+00:00 to 2024-12-13 23:00:00+00:00
Coverage: 0.989
Mean prediction interval width: 9291.9 MW
d2_pinball score: 0.248
d2_pinball score: 0.806
d2_pinball score: 0.316
CV iteration #4
Test set size: 4032 rows
Test time range: 2024-12-14 00:00:00+00:00 to 2025-05-30 23:00:00+00:00
Coverage: 0.978
Mean prediction interval width: 13170.0 MW
d2_pinball score: 0.349
d2_pinball score: 0.790
d2_pinball score: 0.249
# Let's assess the calibration of the quantile regression model:
plot_reliability_diagram(
cv_predictions_bqr_50, kind="quantile", quantile_level=0.50
).interactive().properties(
title="Reliability diagram for quantile 0.50 from cross-validation predictions"
)
plot_reliability_diagram(
cv_predictions_bqr_05, kind="quantile", quantile_level=0.05
).interactive().properties(
title="Reliability diagram for quantile 0.05 from cross-validation predictions"
)
plot_reliability_diagram(
cv_predictions_bqr_95, kind="quantile", quantile_level=0.95
).interactive().properties(
title="Reliability diagram for quantile 0.95 from cross-validation predictions"
)
We can complement this assessment with the Lorenz curves, which only assess the ranking power of the predictions, irrespective of their absolute values.
plot_lorenz_curve(cv_predictions_bqr_50).interactive().properties(
title="Lorenz curve for quantile 0.50 from cross-validation predictions"
)
plot_lorenz_curve(cv_predictions_bqr_05).interactive().properties(
title="Lorenz curve for quantile 0.05 from cross-validation predictions"
)
plot_lorenz_curve(cv_predictions_bqr_95).interactive().properties(
title="Lorenz curve for quantile 0.95 from cross-validation predictions"
)